Re: [Rdkit-discuss] Request for Assistance with MACCS 166 Fingerprint Calculation for 3D QSAR Study

2024-04-23 Thread Greg Landrum
Hi,

Please do not duplicate questions/posts between the mailing list and github
discussions. That's spamming the community.

-greg


On Tue, Apr 23, 2024 at 4:10 PM Ariadna Llop Peiró 
wrote:

> Hello everyone,
>
> I'm currently working with a dataset of chemical compounds, aiming to
> cluster them into different series to create a 3D-QSAR model. Up to this
> point, I've been using Morgan Fingerprints to generate the descriptors and
> cluster the compounds based on their Tanimoto Similarity:
>
> ```
> # Generate fingerprint descriptor database
> fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]
>
>
> # Calculate pairwise Tanimoto similarity between fingerprints
> similarity_matrix = []
> for i in range(len(fps)):
> similarities = []
> for j in range(len(fps)):
> similarities.append(DataStructs.TanimotoSimilarity(fps[i], fps[j]))
>
> similarity_matrix.append(similarities)
> ```
>
>
> With the similarity matrix, I applied hierarchical clustering based on a
> Tanimoto Similarity threshold to group similar compounds:
>
> ```
> # Cluster based on Tanimoto similarity
> dists = 1 - np.array(similarity_matrix)
> hc = hierarchy.linkage(squareform(dists), method='single')
>
> # Specify a distance threshold or number of clusters
> threshold = 0.6  # Adjust this value based on your dendrogram and
> similarity values
> clusters = hierarchy.fcluster(hc, threshold, criterion='distance')
> ```
>
> However, I'm not satisfied with the results and would like to experiment
> with MACCS Keys to see if they yield better clustering outcomes. Does
> anyone know how to cluster compounds using MACCS fingerprints? Any insights
> on the best approach to calculate similarities and cluster using these
> fingerprints would be highly appreciated.
>
> Thank you in advance for your suggestions!
>
> Ariadna Llop
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Strange behaviour for GetSubstructMatches with dative bonds

2024-03-20 Thread Greg Landrum
For what it's worth, this one works too:
m.GetSubstructMatches(Chem.MolFromSmarts('P1->[Zr+3]<-C1'))

It looks like a problem in the way ring closure bonds are being handled in
the SMARTS parser.
Jan: would you mind creating an issue for this in github?

-greg


On Wed, Mar 20, 2024 at 3:30 PM Jan Halborg Jensen 
wrote:

> The following finds no matches:
>
> m = Chem.MolFromSmiles('C1P->[Zr+3]<-1')
> m.GetSubstructMatches(Chem.MolFromSmarts('C1P->[Zr+3]<-1’))
>
> But all these work:
>
> m.GetSubstructMatches(Chem.MolFromSmiles('C1P->[Zr+3]<-1’))
>
> m.GetSubstructMatches(Chem.MolFromSmarts('[*]->[Zr+3]’))
>
> m = Chem.MolFromSmiles('C1P-[Zr+3]-1')
> m.GetSubstructMatches(Chem.MolFromSmarts('C1P-[Zr+3]-1’))
>
>
> Is this a bug, or is there something I’m missing with regard to the first
> case?
>
> Best regards, Jan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Registration for the 2024 RDKit UGM is now open

2024-03-12 Thread Greg Landrum
Dear all,

The (free) registration for the 2024 RDKit UGM, being held from 11-13
September at the ETH in Zurich, Switzerland, is now open:
https://www.eventbrite.com/e/860637719587

You can submit proposals to do talks, tutorials, lightning talks, and
posters here:
https://forms.gle/5GK5ej7hCdPguwKz8

As in the past couple of years, we will stream the talks for people who
cannot attend in person.

Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Ligand conversion problem from 2D to 3D

2024-02-06 Thread Greg Landrum
Hi Amy,

On Tue, Feb 6, 2024 at 8:20 PM He, Amy  wrote:

>
>
> Emre, great to hear from you. I also just wanted to say that not all
> entries in ZINC can be transformed into 3D structures. We encountered a
> couple of instances where the annotated stereo is nonphysical, especially
> at closely-spaced stereo centers in small cycles. I had thought to design a
> check to capture these instances, but eventually I just gave up and
> discarded the entries because no other available methods (including ETKDG)
> or software can build 3D structures for 3D calculations. It would still be
> helpful to consider a check even for 2D calculations.
>
>
Yeah, this is a difficult one. It would be nice to have a check to catch at
least the "simple" cases like these cage structures where there's only one
possible relative stereo possible, but I haven't managed to find it yet.

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Ligand conversion problem from 2D to 3D

2024-02-06 Thread Greg Landrum
Hi Emre,

Both of those compounds look like they have conflicting stereochemistry
information in the ring systems, i.e. the stereo which is specified cannot
actually exist. There's something else going on as well (that looks like a
bug) but this is already a big enough problem.

The easiest thing to do with these compounds (or things which have this
kind of problem) is to just disable the stereochemistry entirely by
removing all instances of the @ symbol from the input strings:
In [26]: ps = rdDistGeom.ETKDGv3()

In [27]: m = Chem.AddHs(Chem.MolFromSmiles('COc1cc(Br)c(C[N+]2(CCOCC[C@H
]3CC[C@H]4C[C@H]3C4(C)C)CCO
...: CC2)cc1OC'.replace('@','')))

In [28]: rdDistGeom.EmbedMolecule(m,ps)
Out[28]: 0

In [29]: m = Chem.AddHs(Chem.MolFromSmiles('CC(C)(C)OC(=O)N[C@H
](C[Si](C)(C)C)C(=O)N1CCC[C@H]1C(=O)
...: N[C@@H](N)B1O[C@@H]2C[C@H]3C[C@@H](C3(C)C)[C@
]2(C)O1'.replace('@','')))

In [30]: rdDistGeom.EmbedMolecule(m,ps)
Out[30]: 0

-greg



On Tue, Feb 6, 2024 at 9:42 AM Emre Apaydın 
wrote:

> Thank you so much for your help. I managed to convert most of the
> molecules from 2D to 3D, but no matter which ETKDG version, which embedding
> parameter I try, I cannot convert these two molecules; ZINC000101210593,
> ZINC000196058327. Is there an alternative feature or method I can try? I
> would be grateful if you could help me. Thank you!
>
> He, Amy , 11 Oca 2024 Per, 01:58 tarihinde
> şunu yazdı:
>
>> Hi Emre!
>>
>>
>>
>> You can get more detailed info on failed conformer generations through
>> *rdDistGeom.EmbedFailureCauses*, see:
>>
>>
>> https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html
>>
>>
>>
>> Bests,
>>
>>
>>
>>
>>
>> --
>>
>> Amy He
>>
>> Hadad Lab @ OSU
>>
>> he.1...@osu.edu
>>
>>
>>
>> *From: *Emre Apaydın 
>> *Date: *Wednesday, January 10, 2024 at 8:47 AM
>> *To: *rdkit-discuss@lists.sourceforge.net <
>> rdkit-discuss@lists.sourceforge.net>
>> *Subject: *[Rdkit-discuss] Ligand conversion problem from 2D to 3D
>>
>> Hello, I want to convert the 2D ligands I downloaded as sdf format from
>> the ZINC library to 3D, but almost half of them are not converted to 3D.
>> Some of them are; ZINC08214373, ZINC01530666, ZINC85545180. 208
>> ligands are not converted
>>
>> Hello,
>>
>>
>>
>> I want to convert the 2D ligands I downloaded as sdf format from the ZINC
>> library to 3D, but almost half of them are not converted to 3D. Some of
>> them are; ZINC08214373, ZINC01530666, ZINC85545180. 208 ligands
>> are not converted to 3D in this way. When I run the script, I do not get
>> any warning or error in IDE. When I look at the output of my Try, Except
>> commands, I see "ZINC08214373.sdf : rdDistGeom.EmbedMolecule(mol,
>> etkdgv3) = Failed
>> ZINC08214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) =
>> Failed" It outputs like this for ligands that are not translated to 3D.
>> When I try different methods, the ligands are converted to 3D. I wonder if
>> there is something missing or wrong with my script. I would be grateful if
>> you can help me.
>>
>>
>>
>> Thank you!
>>
>>
>> ```
>> from rdkit import Chem
>> from rdkit.Chem import rdDistGeom
>> from rdkit.Chem import rdForceFieldHelpers
>> from rdkit.Chem import rdPartialCharges
>> import os
>>
>> ligands_dir = "ligands"
>> output_dir = "new_ligands"
>> status_file = "process_status.txt"
>>
>> if not os.path.exists(output_dir):
>> os.makedirs(output_dir)
>>
>> sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")]
>>
>> with open(status_file, 'w') as status:
>> for sdf_file in sdf_files:
>> input_path = os.path.join(ligands_dir, sdf_file)
>> output_path = os.path.join(output_dir, sdf_file)
>> mol = Chem.MolFromMolFile(input_path)
>>
>> # Add hydrogens
>> try:
>> mol = Chem.AddHs(mol, addCoords=True)
>> except:
>> status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n")
>> continue
>>
>> # 3D embedding
>> etkdgv3 = rdDistGeom.ETKDGv3()
>> embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3)
>> if embed_status == -1:
>> status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol,
>> etkdgv3) = Failed\\n")
>>
>> # Compute Gasteiger charges
>> try:
>> rdPartialCharges.ComputeGasteigerCharges(mol)
>> except:
>> status.write(f"{sdf_file} :
>> rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n")
>>
>> # UFF energy minimization
>> try:
>> rdForceFieldHelpers.UFFOptimizeMolecule(mol)
>> except:
>> status.write(f"{sdf_file} :
>> rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n")
>>
>> Chem.MolToMolFile(mol, output_path)
>> ```
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

Re: [Rdkit-discuss] V2000 to V3000 enhanced stereo question

2024-01-31 Thread Greg Landrum
Thanks for that example Nick.
We can't handle this automatically since there are multiple interpretations
of what the chiral flag means, but I think some relatively straightforward
post-processing can do what you're looking for.
https://gist.github.com/greglandrum/f85097a8489ba4a5825b0981b1fd2408

If people think it's useful, this is something which we could add to the
RDKit itself.

-greg



On Wed, Jan 31, 2024 at 2:53 PM Tomkinson, Nicholas <
nick.tomkin...@astrazeneca.com> wrote:

> Hi Greg – sure. So -
>
>
>
>
>
> If I have a V2000 with or without the chiral flag:
>
>
>
>
>
>   ACCLDraw01312413482D
>
>
>
>   8  8  0  0  1  0  0  0  0  0999 V2000
>
> 4.6334   -6.59690. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 5.6563   -6.00640. C   0  0  2  0  0  0  0  0  0  0  0  0
>
> 6.6791   -6.59690. N   0  0  3  0  0  0  0  0  0  0  0  0
>
> 6.6791   -7.77810. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 5.6563   -8.36860. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 4.6334   -7.77810. C   0  0  1  0  0  0  0  0  0  0  0  0
>
> 5.6563   -4.82570. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 3.6109   -8.36840. C   0  0  0  0  0  0  0  0  0  0  0  0
>
>   1  2  1  0  0  0  0
>
>   3  2  1  0  0  0  0
>
>   4  3  1  0  0  0  0
>
>   5  4  1  0  0  0  0
>
>   1  6  1  0  0  0  0
>
>   6  5  1  0  0  0  0
>
>   2  7  1  1  0  0  0
>
>   6  8  1  1  0  0  0
>
> M  END
>
>
>
>
>
>   ACCLDraw01312413492D
>
>
>
>   8  8  0  0  0  0  0  0  0  0999 V2000
>
> 4.6334   -6.59690. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 5.6563   -6.00640. C   0  0  2  0  0  0  0  0  0  0  0  0
>
> 6.6791   -6.59690. N   0  0  3  0  0  0  0  0  0  0  0  0
>
> 6.6791   -7.77810. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 5.6563   -8.36860. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 4.6334   -7.77810. C   0  0  1  0  0  0  0  0  0  0  0  0
>
> 5.6563   -4.82570. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 3.6109   -8.36840. C   0  0  0  0  0  0  0  0  0  0  0  0
>
>   1  2  1  0  0  0  0
>
>   3  2  1  0  0  0  0
>
>   4  3  1  0  0  0  0
>
>   5  4  1  0  0  0  0
>
>   1  6  1  0  0  0  0
>
>   6  5  1  0  0  0  0
>
>   2  7  1  1  0  0  0
>
>   6  8  1  1  0  0  0
>
> M  END
>
>
>
>
>
> I’d expect the enhanced collections to be output in V3000 format. In this
> case the chiral flag is also set but that’s not a biggy for me. (I wish the
> chiral flag didn’t exist in V3000.)
>
>
>
>
>
>   ACCLDraw01312413472D
>
>
>
>   0  0  0 0  0999 V3000
>
> M  V30 BEGIN CTAB
>
> M  V30 COUNTS 8 8 0 0 1
>
> M  V30 BEGIN ATOM
>
> M  V30 1 C 4.6334 -6.5969 0 0
>
> M  V30 2 C 5.6563 -6.0064 0 0 CFG=2
>
> M  V30 3 N 6.6791 -6.5969 0 0 CFG=3
>
> M  V30 4 C 6.6791 -7.7781 0 0
>
> M  V30 5 C 5.6563 -8.3686 0 0
>
> M  V30 6 C 4.6334 -7.7781 0 0 CFG=1
>
> M  V30 7 C 5.6563 -4.8257 0 0
>
> M  V30 8 C 3.6109 -8.3684 0 0
>
> M  V30 END ATOM
>
> M  V30 BEGIN BOND
>
> M  V30 1 1 1 2
>
> M  V30 2 1 3 2
>
> M  V30 3 1 4 3
>
> M  V30 4 1 5 4
>
> M  V30 5 1 1 6
>
> M  V30 6 1 6 5
>
> M  V30 7 1 2 7 CFG=1
>
> M  V30 8 1 6 8 CFG=1
>
> M  V30 END BOND
>
> M  V30 BEGIN COLLECTION
>
> M  V30 MDLV30/STEABS ATOMS=(2 2 6)
>
> M  V30 END COLLECTION
>
> M  V30 END CTAB
>
> M  END
>
>
>
>
>
>   ACCLDraw01312413492D
>
>
>
>   0  0  0 0  0999 V3000
>
> M  V30 BEGIN CTAB
>
> M  V30 COUNTS 8 8 0 0 0
>
> M  V30 BEGIN ATOM
>
> M  V30 1 C 4.6334 -6.5969 0 0
>
> M  V30 2 C 5.6563 -6.0064 0 0 CFG=2
>
> M  V30 3 N 6.6791 -6.5969 0 0 CFG=3
>
> M  V30 4 C 6.6791 -7.7781 0 0
>
> M  V30 5 C 5.6563 -8.3686 0 0
>
> M  V30 6 C 4.6334 -7.7781 0 0 CFG=1
>
> M  V30 7 C 5.6563 -4.8257 0 0
>
> M  V30 8 C 3.6109 -8.3684 0 0
>
> M  V30 END ATOM
>
> M  V30 BEGIN BOND
>
> M  V30 1 1 1 2
>
> M  V30 2 1 3 2
>
> M  V30 3 1 4 3
>
> M  V30 4 1 5 4
>
> M  V30 5 1 1 6
>
> M  V30 6 1 6 5
>
> M  V30 7 1 2 7 CFG=1
>
> M  V30 8 1 6 8 CFG=1
>
> M  V30 END BOND
>
> M  V30 BEGIN COLLECTION
>
> M  V30 MDLV30/STERAC1 ATOMS=(2 6 2)
>
> M  V30 END COLLECTION
>
> M  V30 END CTAB
>
> M  END
>
>
>
> Cheers
>
>
>
> Nick
>
>
>
>
>
>
>
> *From:* Greg Landrum 
> *Sent:* Wednesday, Janu

Re: [Rdkit-discuss] V2000 to V3000 enhanced stereo question

2024-01-31 Thread Greg Landrum
Hi Nick,

Can you provide an example of exactly what you would like to have happen?

-greg


On Tue, Jan 30, 2024 at 5:46 PM Tomkinson, Nicholas <
nick.tomkin...@astrazeneca.com> wrote:

> I am trying to convert a simple V2000 molfile with or without the chiral
> flag into a V3000 molfile but this does not create an enhanced stereo
> collection in the V3000 molfile. This is a requirement for another
> application that does not handle V2000/V3000 mixtures well. Is there anyway
> of forcing the writing of the enhanced collection in this context?
>
>
>
> Thanks
>
>
>
> Nick
>
>
> --
>
> AstraZeneca UK Limited is a company incorporated in England and Wales with
> registered number:03674842 and its registered office at 1 Francis Crick
> Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
>
> This e-mail and its attachments are intended for the above named recipient
> only and may contain confidential and privileged information. If they have
> come to you in error, you must not copy or show them to anyone; instead,
> please reply to this e-mail, highlighting the error to the sender and then
> immediately delete the message. For information about how AstraZeneca UK
> Limited and its affiliates may process information, personal data and
> monitor communications, please see our privacy notice at
> www.astrazeneca.com
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] toxprint -- CSRML

2023-12-18 Thread Greg Landrum
Hi Marawan,

We don't currently support CSRML. It is certainly an interesting and
flexible format, so it would be cool to have, but it would be a fair amount
of work to implement.

-greg


On Tue, Dec 19, 2023 at 4:19 AM Marawan Hussien via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> I am wondering if standard rdkit supports CSRML, I would like to encode
> the toxprint chemotypes as binary fingerprints for a bunch of molecules to
> train on,
>
> Thanks,
> Marawan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Save the Date! 2024 RDKit UGM

2023-12-09 Thread Greg Landrum
Dear all,

The 2024 RDKit UGM will take place from 11-13 September in Zurich
Switzerland.

We'll post more information and open registration in Q1 of next year.

Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mol to smiles code

2023-10-24 Thread Greg Landrum
I'm not sure exactly what you're looking for, but all of the code for
reading and writing SMILES is here:
https://github.com/rdkit/rdkit/tree/master/Code/GraphMol/SmilesParse

-greg

On Tue, Oct 24, 2023 at 11:51 AM Eduardo Mayo 
wrote:

> Hello all,
>
> I hope you all are doing well.
>
> I am struggling trying to find the code where all the smile to mol and mol
> to smile translation happens. Can someone point me in the right direction?
>
> kind regards,
> eduardo
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RegistrationHash in C++/Java

2023-09-07 Thread Greg Landrum
Hi Giammy,

We currently only have the Python implementation. Doing a C++ version is on
my ToDo list, but I'm not sure when we'll get there.

best regards,
-greg


On Thu, Sep 7, 2023 at 1:17 PM Gianmarco Ghiandoni 
wrote:

> Hello all,
>
> I've been testing the Python module from rdkit.Chem import
> RegistrationHash for some time now and I would like to use it in Java
> too. I browsed the RDKit repository but I could not find it implemented in
> C++, and therefore, not available in the Java JARs.
>
> Am I missing it from somewhere else or is it just implemented in Python?
>
> Thanks,
>
> Giammy
>
> --
> *Gianmarco*
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread Greg Landrum
If you're willing to live with the RDKit's definition of bridgehead (see
below), then there is built-in functionality you can use:

from rdkit.Chem import rdqueries
qa = rdqueries.IsBridgeheadQueryAtom()
mol = Chem.MolFromSmiles('C1CC2CCC1C2')
mol.GetAtomsMatchingQuery(qa)


That last call returns a sequence with the matching atoms.

The RDKit bridgehead definition:
  // at least three ring bonds, all ring bonds in a ring which shares at
  // least two bonds with another ring involving this atom
 is definitely not perfect, primarily because of the use of the ring
systems, but it's the best that we were able to come up with while keeping
things efficient.
There's some discussion here https://github.com/rdkit/rdkit/pull/6061 and
in the linked issue.

-greg


On Fri, Aug 25, 2023 at 11:23 PM Wim Dehaen  wrote:

> greetings all,
> i have thought about the problem some more, and in the end came to the
> conclusion that looping through all rings really is necessary. In the gist
> below you can see the adjusted code, making use of Pat Walters' method
>  for finding
> all rings. Apologies for the code being messy.
> https://gist.github.com/dehaenw/41eb8e4c39c1158e88b36c6dfc2606d8
> fortunately, this one manages to also detect these difficult cases, see
> below:
> i did not check how fast it is, but i guess it will be a fair bit slower.
>
> best wishes,
> wim
>
> On Fri, Aug 25, 2023 at 8:28 PM Wim Dehaen  wrote:
>
>> Dear Andreas,
>> that's a good find. i agree the breaking case can be considered
>> bridgehead structure, as it's essentially bicyclo-[3.2.1]-octane plus an
>> extra bond. I need to think about this some more, but it might be related
>> to getting the ringinfo as SSSR instead of exhaustively. The best solution
>> may therefore be to just prune non ring atoms from the graph, enumerate all
>> rings and check it really exhaustively.
>> FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol =
>> Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on
>> this end.
>> best wishes
>> wim
>>
>> On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens <
>> andreas.lutt...@gmail.com> wrote:
>>
>>> Dear Wim,
>>>
>>> Thanks for your reply!
>>>
>>> Apologies for the delay, finally got time to pick up this project again.
>>>
>>> Your suggestion works great, though I have found some cases where it
>>> breaks. For instance the molecule:
>>>
>>> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")
>>>
>>> It seems, in this case, a bridgehead atom is also a fused-ring atom.
>>> Maybe these looped compounds have too complex topology for this type of
>>> analysis.
>>>
>>> I don't see a straight way forward to identify just the bridgehead atoms.
>>>
>>> Best wishes,
>>> Andreas
>>>
>>> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:
>>>
 Hi Andreas,
 I don't have a good SMARTS pattern available for this but here is a
 function that should return bridgehead idx and not include non bridgehead
 fused ring atoms:

 ```
 def return_bridgeheads_idx(mol):
 bh_list=[]
 intersections=[]
 sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
 for i,ring1 in enumerate(sssr_idx):
 for j,ring2 in enumerate(sssr_idx):
 if i>j:
 intersections+=[ring1.intersection(ring2)]
 for iidx in intersections:
 if len(iidx)>2: #condition for bridgehead
 for idx in iidx:
 neighbors = [a.GetIdx() for a in
 mol.GetAtomWithIdx(idx).GetNeighbors()]
 bh_list+=[idx for nidx in neighbors if nidx not in iidx]
 return tuple(set(bh_list))
 ```

 Here are 6 test molecules:

 ```
 mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
 mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
 mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
 mol4 = Chem.MolFromSmiles("C1CCC12C2")
 mol5 = Chem.MolFromSmiles("C1CC2C1C2")
 mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
 for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
 print(return_bridgeheads_idx(mol))
 ```

 giving the expected answer:

 (2, 5)
 (4, 7)
 (0, 5)
 ()
 ()
 ()

 hope this is helpful!

 best wishes
 wim

 On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens <
 andreas.lutt...@gmail.com> wrote:

> Dear users,
>
> I am trying to identify bridgehead atoms in multi-looped ring systems.
> The issue I have is that it can be sometimes difficult to distinguish 
> these
> atoms from ring-fusion atoms. The pattern I used (see below) looks for
> atoms that are part of three rings but cannot be bonded to an atom that
> also fits this description, in order to avoid ring-fusion atoms. The code
> works, except for cases where bridgehead atoms are bonded to a ring-fusion
> atom.
>
> *PASS:*
> pattern 

Re: [Rdkit-discuss] atom indexing

2023-06-19 Thread Greg Landrum
Hi Ling,

On Mon, Jun 19, 2023 at 3:03 AM Ling Chan  wrote:

>
> I got some questions about atom indexing. Just wonder if you could help me?
>
>1. In m3=Chem.CombineMols(m1,m2) , is it guaranteed that the atom
>indices in m3 is equivalent to the indices in m1 followed by the indices in
>m2?
>
> Yes

>
>1. If I construct an editable mol from m1, is it that the atomic
>indices in the editable mol is equivalent to that in m1? And when I convert
>the editable mol back, suppose the atom indexing is also preserved?
>2.
>
> Yes

>
>1. Same as #2, but for an RWMol instead of an editable mol.
>
> Yes

>
>1. If I delete an "F" atom from an editable mol, is there a way to
>mark the atom in the new mol that was originally bonded to the "F"? I mean,
>if I get its atomic index before the deletion, suppose it won't be
>preserved.
>
> You can set a property on the neighboring atom with something like:
atom.SetProp("F_Neighbor","1")

>
>1. Similar to #4, but for DeleteSubstructs.
>
> Same answer: you can always use SetProp


> Alternatively, if there is a way to mark atoms, I don't need the atom
> indices anyway.
>

As mentioned, SetProp is great for this. Note that using the property
interface is slower than just relying on the indexing remaining the same,
which you can do in the first two cases.

best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SCSR

2023-04-28 Thread Greg Landrum
Hi Susan,

The RDKit does not currently support SCSR.

-greg



On Fri, 28 Apr 2023 at 15:07, Susan Leung  wrote:

> Hi all,
>
> I am trying to read in some Self-Contained Sequence Representation (SCSR)
> structures
> https://doi.org/10.1021/ci2001988
>
> But I am encountering some issues. I just wanted to clarify, does RDKit
> support this representation?
>
> Many thanks!
>
> Susan
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 2023 RDKit UGM registration open

2023-03-01 Thread Greg Landrum
Dear all,

This year's RDKit UGM will take place in Mainz, Germany from 20-22
September. Paul Czodrowski is this year's organizer.

As last year, we're planning the UGM as an in-person event with a live
stream of the sessions for people who can't make it in person.

Free registration for both in-person and virtual attendance is open here:
https://www.eventbrite.com/e/12th-rdkit-ugm-2023-tickets-566636253287


If you're interested in contributing a talk, poster, lightning talk,
tutorial, etc. to the UGM, please submit your ideas here:
https://forms.gle/P2Xt7ag9fQMSkAJt5

We're really looking forward to another great UGM and to getting to see
some of you in Mainz!

Best regards,
Greg and Paul
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToMolBlock problem

2023-02-22 Thread Greg Landrum
Hi Ling,

I can't reproduce this problem on windows using the most recent version of
the RDKit.
Which version of the RDKit are you using and how did you install it?

Please also share exactly what you see for an error message.

-greg


On Tue, Feb 21, 2023 at 7:03 AM Ling Chan  wrote:

> Dear colleagues,
>
> Don't know if this is a bug, or if my input molecule is not good. I
> suspect that it is the former.
>
> If you run the following on the file "full.sdf", it will crash at the
> MolToMolBlock line.
>
> for m1 in Chem.SDMolSupplier(inputsdf, removeHs=False):
> m2=Chem.RemoveHs(m1)
> print (Chem.MolToMolBlock(m2))
>
> You can confirm that the problem is due to the stereo definition of the
> double bond, since if you edit the bond line " 4 5 2 3" to " 4 5 2 0" it
> will not crash.
>
> I tried to simplify the situation by boiling the molecule down to
> "simple.sdf". Unfortunately it does not crash any more.
>
> Thanks.
>
> Ling
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Inconsistent GETAWAY descriptors

2023-02-22 Thread Greg Landrum
Hi Joao,

I can't reproduce that behavior on Windows using the most recent
conda-forge version of the RDKit: I always get the same values for the
first few GETAWAY descriptors and  numbers 245 and 246.

Which operating system/RDKit version are you using?

-greg


-greg

On Fri, Feb 17, 2023 at 9:45 PM J Sousa  wrote:

> Hi,
> I'm getting different results almost every time I run the script below.
> Sometimes the descriptors in positions 245 and 246 get different very high
> values.
> The input.sdf file goes attached.
>
> import rdkit
> from rdkit import Chem
> from rdkit.Chem import Descriptors, rdMolDescriptors
> suppl = Chem.SDMolSupplier('input.sdf', removeHs = False)
> foutput = open("output.txt", "w")
> for mol in suppl:
>  descriptorsGETAWAY=rdMolDescriptors.CalcGETAWAY(mol)
>  for item in descriptorsGETAWAY:
>  foutput.write("%s\n" % item)
> foutput.close()
>
>
> What is wrong?
> Thanks,
> Joao Sousa
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS: "NOT Hydrogen" wildcard?

2023-01-30 Thread Greg Landrum
Hi Thomas,

* in SMARTS just means "any atom".
[!H], for historical reasons, means "and atom without a single Hydrogen"
(i.e. it matches CH2 and CH3, but not CH)
You want [!#0], that is "not hydrogen"

-greg


On Mon, Jan 30, 2023 at 5:40 PM Thomas  wrote:

> I thought that the wildcard * would match any atom except hydrogen, but
> that's true unless hydrogens are explicit in the molecule
>
> I have some patterns in the form of SMILES with wildcards and implicit
> hydrogens. For example C* means "terminal carbons" only.
> (" * "  stands for any atom except hydrogen)
>
> I want to transform this SMILES in SMARTS, if I just write:
>
> smarts = rdkit.MolFromSmarts('*C')
>
> the smarts I get matches any C with AT LEAST one non-hydrogen bond (not
> EXACTLY one).
>
> If I add explicit hydrogens to the smarts (and to the molecules to be
> tested)
>
> smartsH = rdkit.AddHs(smarts)
> rdkit.MolToSmiles(smartsH)
> '*C([H])([H])[H]'
>
> I get this pattern where the wildcard matches ANY atom including hydrogen
> (it matches with the single carbon atom).
>
> Basically I am trying to get the SMARTS *C[H3] starting from the
> respective SMILES *C. Is there a way?
>
> I've already tried to replace the * with a [!H] (NOT hydrogen) with no
> luck.
> Thanks to anyone :)
> Thomas
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Changes in morgan fingerprint code?

2023-01-12 Thread Greg Landrum
Hi Eric,

That would be due to the fix for this bug:
https://github.com/rdkit/rdkit/issues/5036
If you were generating the fingerprints on "normal" (i.e.
hydrogen-suppressed) graphs, you wouldn't notice this one, but the fact
that you add the Hs before generating the fingerprint causes you to notice
it.

Just as an FYI: the best easy way, by far, to keep track of whether or not
you've seen a particular molecule is to use the SMILES.

-greg


On Fri, Jan 13, 2023 at 6:27 AM Eric Jonas  wrote:

> Hello! I use the crc of morgan fingerprints as a quick-and-dirty way to
> keep track of different molecules, but now I realize it might have been too
> quick and dirty! In particular, there appears to have been a change in the
> morgan code sometime between 2021.09.02 and 2022.03.05. The following code
> produces different output under these versions:
>
> import rdkit.Chem
> import pickle
> from rdkit import Chem
>
> import rdkit.Chem.rdMolDescriptors
> import zlib
>
> def get_morgan4_crc32(m):
> mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4)
> morgan4_crc32 = zlib.crc32(mf.ToBinary())
> return morgan4_crc32
>
> mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1'))
> print(get_morgan4_crc32(mol))
>
> 2021.09.2 : 1567135676
> 2022.03.5 : 204854560
>
> I tried looking at the release notes but I didn't seem to see any breaking
> changes (I might have missed them!) and I tried looking at "blame" for the
> relevant source but didn't see any seemingly-substantive changes within the
> relevant timeframe.
>
> So am I doing something crazy here, or did something change deliberately,
> or is it possible this is a bug?
>
> ...E
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question about tautomer hash

2023-01-11 Thread Greg Landrum
Hi Susan,

The current version of the tautomer hash doesn't do keto-enol tautomerism
(your first example). It would be worthwhile for us to add this as an
option, but it's not currently available.

-greg


On Wed, Jan 11, 2023 at 3:04 PM Susan Leung  wrote:

> Hi all,
>
>
>
> I am trying out the new registration hash and have a question about the
> tautomer hash. I think these two molecules (m1 and m2) should have the same
> tautomer hash but they are different. However, molecules m3 and m4 have the
> same hash. Please can you explain?
>
>
>
> import rdkit
>
> from rdkit import Chem
>
> from rdkit.Chem import Draw
>
> from rdkit.Chem import RegistrationHash
>
> from rdkit.Chem.RegistrationHash import HashLayer
>
>
>
> print(f'>> {rdkit.__version__}')
>
>
>
> m1 = Chem.MolFromSmiles('C=C(O)C')
>
> m2 = Chem.MolFromSmiles('CC(=O)C')
>
> h1 = RegistrationHash.GetMolLayers(m1)
>
> h2 = RegistrationHash.GetMolLayers(m2)
>
> print(f'>> {h1[HashLayer.TAUTOMER_HASH]}')
>
> print(f'>> {h2[HashLayer.TAUTOMER_HASH]}')
>
>
>
> m3 = Chem.MolFromSmiles('N=C(O)C')
>
> m4 = Chem.MolFromSmiles('NC(=O)C')
>
> h3 = RegistrationHash.GetMolLayers(m3)
>
> h4 = RegistrationHash.GetMolLayers(m4)
>
> print(f'>> {h3[HashLayer.TAUTOMER_HASH]}')
>
>
>
> >> 2022.09.1
> >> [CH2][C](C)[O]_1_0
> >> C[C](C)[O]_0_0
> >> C[C]([N])[O]_2_0
> >> C[C]([N])[O]_2_0
>
>
> Thanks!
>
>
> Susan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fingerprint visualization drawings

2022-12-31 Thread Greg Landrum
Hi Ling,

Unfortunately there isn't any manual (or section of the manual) for the bit
rendering code and the API documentation is also missing.

There are some blog posts out there with more information:
http://rdkit.blogspot.com/2018/10/using-new-fingerprint-bit-rendering-code.html
https://iwatobipen.wordpress.com/2018/11/07/visualize-important-features-of-machine-leaning-rdkit/

If you google for the functions you're interested in you will probably find
additional resources.

I hope this helps,
-greg



On Fri, Dec 30, 2022 at 6:45 AM Ling Chan  wrote:

> Dear Colleagues,
>
> Happy New Year!
>
> I am trying to make some illustrations regarding the meaning of
> fingerprint bits. Thanks to Jan Jensen for the tips at the recent post of
> https://sourceforge.net/p/rdkit/mailman/message/37734537/ .
>
> I just wonder if there is any manual for the fingerprint drawing routines?
> I only managed to find the page
> http://rdkit.org/docs/source/rdkit.Chem.Draw.html . Butt there does not
> seem to be much detail there. For example, Jan's example in the above post
> would not be described. I wonder what other function arguments are
> available.
>
> Thank you.
>
> Ling
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Adding hydrogen to conformations works srangely.

2022-12-20 Thread Greg Landrum
On Tue, Dec 20, 2022 at 11:11 AM Wim Dehaen  wrote:

> Hello,
> I think the place the hydrogens get lost is during the "MolFromMolBlock"
> operation. Try to add the flag *removeHs=False.*
>

Wim's answer is correct: by default the RDKit functions which construct
molecules from SMILES, Mol, etc. will remove Hs. This behavior can be
disabled if you want to keep the Hs.

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] CIPLabeler ranks

2022-12-19 Thread Greg Landrum
Hi Juuso,

Based on what you've described, I think you can use the canonical atom
ranks generated without tie breaking:

In [8]: m = Chem.MolFromSmiles('C[C@H]1OC[C@H](F)CC1(C)C')


In [9]: list(Chem.CanonicalRankAtoms(m,breakTies=False))
Out[9]: [0, 8, 6, 4, 7, 3, 5, 9, 1, 1]


This uses the same code that the canonical SMILES algorithm uses.

One thing to be aware of with this code: it does use atom map information
as part of the ranking. I don't think that this make sense for your use
case,  so if you have atom maps on the atoms, you probably want to remove
them before generating the ranks

Best,
-greg





On Mon, Dec 19, 2022 at 2:22 PM Juuso Lehtivarjo 
wrote:

> Hi Greg,
>
> Thanks for your answer. After my post I got myself the Hanson et al.
> paper, and now understand better how the new algorithm works, and why there
> is no such thing as CIPRanks anymore.
>
> I use the CIPRanks for prochirality assignment, and subsequently those
> assignments in NMR grouping (=which nuclei are chemically equivalent in
> NMR). Briefly, I check if the neighbors of a stereogenic center have two
> equal CIPRanks, manipulate one of those to have priority over the other and
> re-calculate the stereochemical label, which then reveals the pro-R/pro-S
> assignment. I think the new code can be modified to do something similar,
> basically answering the question "These branches of the graph have the same
> priority, but what would be the stereo label if they would not?".
>
> Anyway, I hope that the legacy CIPRank code is not going to be removed in
> the future, since the subsequent part of my NMR grouping code still
> involves CIPRank manipulations & re-ranking - this can't be avoided since
> finally I still need a global ranking to tell which nuclei are chemically
> equivalent.
>
> Happy holidays!
> Juuso
>
> On Fri, Dec 16, 2022 at 5:24 PM Greg Landrum 
> wrote:
>
>> Hi Juuso,
>>
>> No, the new code does not calculate those ranks.
>> There may be alternatives though; what are you interested in using the
>> CIP ranks for?
>>
>> -greg
>>
>>
>> On Thu, Dec 15, 2022 at 3:45 PM Juuso Lehtivarjo <
>> juuso.lehtiva...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> According to my tests with the new CIPLabeler, it seems that it does not
>>> store the CIP ranking to any property, such as the _CIPRank prop that was
>>> filled in the legacy AssignStereochemisty, am I correct? Is it possible to
>>> retrieve this information somehow?
>>>
>>> Cheers,
>>> Juuso
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] new RDKit FAQ

2022-12-17 Thread Greg Landrum
Dear all,

After thinking about it but not doing anything far, far too many times,
I've finally managed to start an RDKit FAQ. For the moment I'm using the
github wiki for this since it's easy and quite visible:
https://github.com/rdkit/rdkit/wiki/FrequentlyAskedQuestions

If you have ideas for things that should be on the FAQ, ideally with
answers, please feel free to reply here or, even better, post to the
relevant topic in the github discussions:
https://github.com/rdkit/rdkit/discussions/categories/faq

Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] CIPLabeler ranks

2022-12-16 Thread Greg Landrum
Hi Juuso,

No, the new code does not calculate those ranks.
There may be alternatives though; what are you interested in using the CIP
ranks for?

-greg


On Thu, Dec 15, 2022 at 3:45 PM Juuso Lehtivarjo 
wrote:

> Hi,
>
> According to my tests with the new CIPLabeler, it seems that it does not
> store the CIP ranking to any property, such as the _CIPRank prop that was
> filled in the legacy AssignStereochemisty, am I correct? Is it possible to
> retrieve this information somehow?
>
> Cheers,
> Juuso
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-22 Thread Greg Landrum
Hi Dave,

We have multiple examples of this in the code, here’s one:
https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40

I’m not sure how this would interact with the call to Python::extract
that’s in the bulk functions though

It might be better to handle the multithreading on the C++ side by adding
an optional nThreads argument to  the bulk similarity functions. (Though
this would have to wait for the next release since it’s a feature addition…
we can declare releasing the GIL as a bug fix)

-greg


On Sat, 22 Oct 2022 at 09:48, David Cosgrove 
wrote:

> Hi,
>
> I'm doing a lot of tanimoto similarity calculations on large datasets
> using BulkTanimotoSimilarity.  It is an obvious candidate for
> parallelisation, so I am using concurrent.futures to do so.  If I use
> ProcessPoolExectuor, I get good speed-up but each process needs a copy of
> the fingerprint set and for the sizes I'm dealing with that uses too much
> memory.  With ThreadPoolExecutor I only need 1 copy of the fingerprints,
> but the GIL means it only runs on 1 thread at a time so there's no gain.
> Would it be possible to amend the C++ BulkTanimotoSimilarity to free the
> GIL whilst it's doing the calculation, and recapture it afterwards?  I
> understand things like numpy do this for some of their functions.  I'm
> happy to attempt it myself if someone who knows about these things can
> advise that it could be done, it would help, and they could provide a few
> pointers.
>
> Thanks,
> Dave
>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Working with SDF from varying locales?

2022-09-30 Thread Greg Landrum
On Fri, Sep 30, 2022 at 4:35 PM Rocco Moretti  wrote:

> Hi Greg,
>
> > The RDKit doesn't normally convert data field values into floats unless
> you explicitly ask it to
>
> I did notice that mol.GetProp() will always return things by string, and
> you would need to use mol.GetDoubleProp() if you explicitly wanted a
> numeric value, but it looks like mol.GetPropsAsDict() will automatically
> convert to integers/floating point as appropriate. I guess I was wondering
> if there was a way to get GetPropsAsDict() to be more gregarious with the
> locale (and/or make GetDoubleProp() more robust to not raising an
> exception).
>

I don't believe that there is.

But if I need to handle the locale re-parsing on my own, I can probably
> knock something together to do that.
>

I think this will be necessary, particularly since it sounds like you need
to try multiple locales anyway.



> Luckily the CTAB section in my files are all the same C locale, so I don't
> have to worry about that headache.
>

That's at least something to be grateful for! :-)

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Working with SDF from varying locales?

2022-09-30 Thread Greg Landrum
Hi Rocco,

Paolo already replied about the options available for python when
interpreting the data fields from an SDF. The RDKit doesn't normally
convert data field values into floats unless you explicitly ask it to, so
this would be fine to do from Python

The CTAB part of the SDF, which includes the coordinates, always parses the
coordinates using the C locale (regardless of what the current locale on
the machine is)... this is more or less part of the CTAB spec from MDL.

-greg


On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti  wrote:

> Hello,
>
> I have a number of SDFs of molecules with associated data blocks. (That
> is, the `>` section that comes after `M END` and before ``.)
>
> The problem I have is that these SDFs were generated in different
> countries, and have different locales -- most notably, some of them use "."
> as the decimal separator for real-valued properties and some use ",".  To
> make things even more fun, some use a mix of both, depending on who
> calculated which properties where.
>
> Is there any facility in RDKit for reading in such locale-varying SDF
> files and normalizing them?
>
> Thanks,
> Rocco
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit in Google Colab

2022-08-03 Thread Greg Landrum
Hi Eduardo,

In order for anybody to be able to help here we need more information: how
did you install the rdkit in the notebook, which versions of everything
else are you using, etc.
The easiest way to answer this would be to just include a link to the colab
notebook itself.

-greg


On Wed, Aug 3, 2022 at 3:44 PM Eduardo Mayo 
wrote:

> Hello,
>
> I have used RDKit in a Google collab before (a few months ago). However,
> when I tried today, I got the following error message:
>
> ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not
> found (required by
> /usr/local/lib/python3.7/site-packages/rdkit/Chem/../../../../libRDKitFileParsers.so.1)
>
>
> Does anyone knows a workaround ??
>
> All the best,
> Eduardo
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Suggestions on improving RDKIT

2022-07-27 Thread Greg Landrum
Hi,

PATH is not a variable used by the RDKit, that's something which is used by
your operating system, so you'd need to check however your operating system
handles non-ASCII characters.

The RDKit does use the variable RDBASE, which is handled internally by
reading it into an 8 bit string, so I guess there we are limited to things
like UTF-8. I will take a look and see if there's a way we can extend that
to allow generic unicode support.

-greg


On Wed, Jul 27, 2022 at 8:11 AM Sun, Peike via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> To whom it may concern,
>
> This is Peike Sun, an undergraduate student from King's College London. I
> am following my professor to work on an interesting project in the
> summertime, and we are using rdkit. When I tried to download rdkit, I found
> the PATH would not be recognized unless it is written in English
> characters.
>
> So, I am writing to sincerely ask, could you make it more general such
> that PATH written in all languages will be recognized? I do appreciate your
> efforts!
>
> I wish you a really lovely summer!
>
>
> Best regards,
> Peike
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Expected behaviour for rdkit.do_enhanced_stereo_sss in the postgres cartridge?

2022-07-20 Thread Greg Landrum
Hi Susan,

On Wed, Jul 20, 2022 at 4:32 PM Susan Leung  wrote:

>
> I just noticed this because I found other toolkits have the exact opposite
> behaviour. I can see the point of view from both sides.
>
> With other toolkits, we can search for chirally pure molecules; however,
> in RDKit we cannot (e.g. using @ in a query would return @, OR and AND).
>

You are correct that there's not currently a way to do a search such that
specified chirality is not a substructure match of enhanced stereo. That
would need to be done in a post-processing step and isn't trivial. That's
something we could add.


> Also it seems with RDKit, using a more general query e.g. OR, gives fewer
> hits than a more restrictive query e.g. ABS.
>

That's an interesting point which, I think, comes down to whether or not OR
(general) is a substructure of @ (specific). At the moment it's not, but I
think there's an argument to be made either way.

In case it's useful, here's a gist which shows the matching rules:
https://gist.github.com/greglandrum/29e3c72b401ed5d88726be05908f100f

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Expected behaviour for rdkit.do_enhanced_stereo_sss in the postgres cartridge?

2022-07-20 Thread Greg Landrum
Hi Susan,

The expected behavior for substructure search with enhanced stereo is
documented here:
https://www.rdkit.org/docs/RDKit_Book.html#enhanced-stereochemistry-and-substructure-search

A quick explanation of the logic, assuming a single stereocenter for
simplicity:
- OR contains either the @ or the @@ form
- AND contains both the @ and the @@ form.

So OR is automatically a substructure of AND, but AND can never be a
substructure of OR.
ABS is clearly a substructure of AND (but not vice versa)
whether or not ABS is a subset of OR isn't obvious; the code currently says
that it is, but one could argue the other way

Does that make sense?

-greg


On Wed, Jul 20, 2022 at 2:11 PM Susan Leung  wrote:

> Hi all,
>
>
>
> I am trying to do substructure search, taking into account enhanced
> stereochemistry using the postgres cartridge.
>
>
>
> I am finding :
>
> with an absolute query, it matches AND stereochemistry and OR
> stereochemistry,
>
> With an OR query it matches AND stereochemistry,
>
> With an AND query it matches neither.
>
>
>
> Is this expected? Or please can anyone clarify the expecting behaviour
> with rdkit.do_enhanced_stereo_sss=true? (Or point me to some
> documentation if I've missed it!)
>
>
> Thanks!
>
>
> Susan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] What is the recommended 3D-sensitive file format to use with RDKit?

2022-06-16 Thread Greg Landrum
Hi Francois,

Yes, I would recommend SDF, the V3000 version if possible. The xyz format
is problematic because it doesn’t have bonds. There is still a way to kind
of work with that together with SMILES or sdf though:
https://mattermodeling.stackexchange.com/questions/7234/how-to-input-3d-coordinates-from-xyz-file-and-connectivity-from-smiles-in-rdkit

As for mol2: the RDKit has some support, but only for the corina version of
mol2. I would avoid that format if at all possible


-greg

On Thu, 16 Jun 2022 at 13:58, Francois Berenger  wrote:

> Hi all,
>
> I assume it's ".sdf".
>
> But, do we have good support for ".xyz" also?
>
> In addition, what about RDKit's support of ".mol2" these days?
>
> Regards,
> F.
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Different 3D descriptors depending on mol reading method

2022-06-15 Thread Greg Landrum
Hi,

I guess the differences you are seeing are arising because you have
different conformers of the molecule.
The conformer generation process  in EmbedMolecule() uses a stochastic
procedure and if you want to be sure that you get the same results from
multiple runs you need to provide a random seed using the randomSeed
argument.

Please give that a try and see if it helps,
-greg




On Tue, Jun 14, 2022 at 9:15 PM J Sousa  wrote:

> I'm trying RDKit to calculate 3D descriptors, but I get significant
> different descriptors if I read molecules from a SMILES file (and
> clean/optimize the 3D structure before calculating the descriptors) or if I
> read the SDF file obtained from exactly the same SMILES file using exactly
> the same code to optimize the structures.
>
> Scripts attached.
>
> Running smiltodesc_check.py produces descr_myfile.txt
>
> Running gen3D_check.py and then descr_from_sdf_check.py produces
> myfile_descr.txt
>
> But the two files are significantly different.
>
> Why aren't they the same? Which is wrong?
>
> JSousa
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] about SMILES

2022-06-13 Thread Greg Landrum
Hi Jean-Marc,

The question about atom data was answered elsewhere by Nils, but on atom
ordering:

On Mon, Jun 13, 2022 at 2:50 PM Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

>
> About mol = Chem.MolFromSmiles(smi), I would like to know
> whether the atoms indexes in mol follow always exactly the apparition
> order of
> the atoms in smi.
>

The RDKit preserves the atom ordering. The only exception to this is that
by default any hydrogens which are present in the SMILES will be removed
(you can turn this off), so 'FC(O[H])Br' ends up being a four atom molecule
with indices 0:F, 1:C, 2:O, 3:Br

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern

2022-06-07 Thread Greg Landrum
Hi Eduardo,

If I'm understanding what you want to do correctly, then you could try
extending your SMARTS pattern to include a ring bond to a neighbor from
each atom in the ring:
*@*~1~*(@*)~*(@*)~*(@*)~*(@*)~*~1@*

If you only want the indices of the ring atoms, you can then just pick
those out of the match results you get back

-greg


On Tue, Jun 7, 2022 at 7:23 PM Eduardo Mayo 
wrote:

> Greetings!!
>
> I hope this email finds you well.
>
> I need a SMARTS pattern that matches this molecule fragment
> [image: image.png]
> The first pattern I used was:
> [*;R2]~1~[*;R2]~[*;R2]~[*;R2]~[*;R2]~[*;R2]~1
>
> However, it also matches this fragment. This is not the expected behavior
> but it agrees with the pattern, so I tried adding the ring size constrain.
> [image: image.png]
> Now the pattern I am using is this:
> [*;R2r6]~1~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~1
>
> It worked quite well but now it fail to find matches in this molecule
> [image: image.png]
>
> Does anyone know what I am doing wrong??
>
> Code:
> ---
>
> m1 = Chem.MolFromSmiles(
> "c1ccc2cc3c(ccc4c5c5c5cc6c7cc8c(cc7c6cc5c34)c3cccnc38)cc2c1")
> m2 = Chem.MolFromSmiles(
> "b12c1c1c(c3ccc4ccc4c3c3c4c5cc[nH]c5c4c13)c1ncc3c3c21")
> m3 = Chem.MolFromSmiles(
> "b1ccbc2c1c1ccoc1c1c2c2ccsc2c2[nH]c3ncc4c(c3c21)=c1n1=4")
>
> p = Chem.MolFromSmarts("[*;R2]~1~[*;R2]~[*;R2]~[*;R2]~[*;R2]~[*;R2]~1")
> for m, expected_value in zip([m1,m2,m3],[1,2,2]):
> print(len(m.GetSubstructMatches(p)) == expected_value)
>
>
> p = Chem.MolFromSmarts(
> "[*;R2r6]~1~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~1")
> for m, expected_value in zip([m1,m2,m3],[1,2,2]):
> print(len(m.GetSubstructMatches(p)) == expected_value)
>
> All the best,
> Eduardo
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SGroup information in SD files

2022-05-24 Thread Greg Landrum
Hi Thomas,

On Tue, May 24, 2022 at 10:43 PM 
wrote:

>
>
> how would I get the SGroups information into a ROMol so that this is
> output when I write the ROMol to a SDFile?
>
> I have seen the SubstanceGroup class, but haven’t found any example how to
> use this in a python context.
>
>
You can use the function Chem.CreateMolSubstanceGroup() to attach a new
SGroup to a molecule (as Dan pointed out elsewhere in this thread, you
can't directly create an SGroup).
There's an example of how to do this here:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Wrap/testSGroups.py#L194

I hope this helps,
-greg

>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] permutations of symmetric atoms

2022-04-16 Thread Greg Landrum
Hi Diogo,

The easiest way to do this is to use the substructure matching code with
"uniquify=False" to find all the automorphisms between a molecule and
itself:
In [8]: m1 = Chem.MolFromSmiles('Oc1c1')

In [9]: list(m1.GetSubstructMatches(m1,uniquify=False))
Out[9]: [(0, 1, 2, 3, 4, 5, 6), (0, 1, 6, 5, 4, 3, 2)]

Here's another example:
In [10]: m = Chem.MolFromSmiles('Oc1ccc(c2ccc(Cl)cc2)cc1')

In [11]: list(m.GetSubstructMatches(m,uniquify=False))
Out[11]:
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13),
 (0, 1, 2, 3, 4, 5, 11, 10, 8, 9, 7, 6, 12, 13),
 (0, 1, 13, 12, 4, 5, 6, 7, 8, 9, 10, 11, 3, 2),
 (0, 1, 13, 12, 4, 5, 11, 10, 8, 9, 7, 6, 3, 2)]


I hope this helps,
-greg


On Sat, Apr 16, 2022 at 1:46 AM Diogo Martins 
wrote:

> Hello,
>
> I'd like to enumerate all possible permutations of symmetric atoms.
> Consider the following code:
>
> phenol = Chem.MolFromSmiles("Oc1c1")
> equivalencies = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
> print(equivalencies)
> [0, 6, 4, 2, 1, 2, 4]
>
> Atoms that have the same value in list "equivalencies" are symmetric. For
> phenol, the equivalent atoms correspond to a 180 degree rotation of the
> aromatic ring over the axis containing the carbon-oxygen bond. The possible
> permutations, expressed as atom indices, are:
> [0, 1, 2, 3, 4, 5, 6]
> [0, 1, 6, 5, 4, 3, 2]
>
> By permutations, I mean that it is possible to replace the coordinates of
> the atoms and produce a realistic molecule.
>
> A brute force approach comes to mind, where one would enumerate all
> possible combinations, and exclude those that change the molecular graph.
> In the example above there are four possible combinations, because there
> are two groups of two symmetric atoms. An example of an "invalid"
> combination is swapping the third and seventh atoms without swapping the
> fourth and sixth atoms:
> [0, 1, 6, 3, 4, 5, 2]
> This would be excluded as it breaks the bond between the third and fourth
> atoms (among other bonds).
>
> Is there a method in the RDKit to enumerate the valid permutations?
>
> Thank you,
> Diogo
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] pharmacophore

2022-03-29 Thread Greg Landrum
Hi Mu,

The RDKit has code for identifying pharmacophoric points and calculating
the distances between them, but there is no pharmacophore perception tool
in the core RDKit.

Best regards,
-greg



On Tue, Mar 29, 2022 at 12:25 PM Muhammad Akram 
wrote:

> Hello Everybody,
>
>
>
> I am looking if there is a way to extract a pharmacophore from
> co-crystallized ligand using RDKit.
>
>
>
> Thank you so much in advance.
>
>
>
> Kind Regards,
>
> Mu
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Forcing depiction to match input mol block

2022-03-23 Thread Greg Landrum
Hi Adam,

By default the RDKit picks the locations to do wedging. There's no built-in
function to reapply the wedging from an input mol file (we probably should
add this), but you can find a good one here, as part of the ChEMBL
structure pipeline:
https://github.com/chembl/ChEMBL_Structure_Pipeline/blob/master/chembl_structure_pipeline/standardizer.py#L469

-greg


On Tue, Mar 22, 2022 at 11:01 AM Ádám Baróthi 
wrote:

> Hello Everyone,
>
> I'm having some trouble trying to depict molecules the exact same way as
> the input mol block (V2000) was drawn. My main problem is that I've drawn
> in a wedge bond between atoms 1 and 7 (the right hand side of the
> cyclopropyl ring), and RDKit depicts the molecule with a wedge bond between
> atoms 0 and 1 (the left hand side of the cyclohexane ring).
>
> Google colab showing the difference between the structures
> 
>
> Is there a way to force the depiction to match the input exactly without
> having to set the bond types manually?
>
> Molecule in question:
>
>   ACCLDraw0310522D
>
>  11 12  0  0  1  0  0  0  0  0999 V2000
>15.0356   -8.97260. C   0  0  0  0  0  0  0  0  0  0  0  0
>16.0593   -8.38150. C   0  0  2  0  0  0  0  0  0  0  0  0
>17.0831   -8.97260. C   0  0  0  0  0  0  0  0  0  0  0  0
>17.0831  -10.15470. C   0  0  0  0  0  0  0  0  0  0  0  0
>16.0593  -10.74570. C   0  0  2  0  0  0  0  0  0  0  0  0
>15.0356  -10.15470. C   0  0  0  0  0  0  0  0  0  0  0  0
>16.0593  -11.92690. N   0  0  0  0  0  0  0  0  0  0  0  0
>16.6510   -7.35670. C   0  0  3  0  0  0  0  0  0  0  0  0
>15.4680   -7.35670. C   0  0  0  0  0  0  0  0  0  0  0  0
>17.6738   -6.76620. C   0  0  0  0  0  0  0  0  0  0  0  0
>18.6967   -7.35670. O   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0  0  0  0
>   2  3  1  0  0  0  0
>   3  4  1  0  0  0  0
>   4  5  1  0  0  0  0
>   5  6  1  0  0  0  0
>   1  6  1  0  0  0  0
>   5  7  1  1  0  0  0
>   2  8  1  1  0  0  0
>   8  9  1  0  0  0  0
>   2  9  1  0  0  0  0
>   8 10  1  0  0  0  0
>  10 11  1  0  0  0  0
> M  END
>
> Best,
> Adam
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of the 2022.03.1 release available

2022-03-19 Thread Greg Landrum
Thanks for letting me know Markus. I'm glad everything looks good so far!

On Fri, Mar 18, 2022 at 11:05 PM Markus Sitzmann 
wrote:

> Hi Greg,
>
> to give you some feedback: I switched my current research project to the
> beta version and didn't find any problem yet ;-)
>
> Best,
> Markus
>
> On Fri, Mar 18, 2022 at 1:32 PM Greg Landrum 
> wrote:
>
>> Dear all,
>>
>> I tagged the first beta of the 2022.03 RDKit release this morning.
>> Assuming nothing weird shows up during testing, we'll do the actual
>> release on the 25th.
>>
>> You can find the new beta here:
>> https://github.com/rdkit/rdkit/releases/tag/Release_2022_03_1b1
>>
>> Conda builds of the beta are available in the rdkit channel for python
>> 3.8 on Mac and Linux:
>> conda install -c rdkit/label/beta rdkit rdkit=2022.03
>>
>> Please try out the beta and let us know if you find any problems!
>>
>> Best regards,
>> -greg
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Beta of the 2022.03.1 release available

2022-03-18 Thread Greg Landrum
Dear all,

I tagged the first beta of the 2022.03 RDKit release this morning. Assuming
nothing weird shows up during testing, we'll do the actual release on the
25th.

You can find the new beta here:
https://github.com/rdkit/rdkit/releases/tag/Release_2022_03_1b1

Conda builds of the beta are available in the rdkit channel for python 3.8
on Mac and Linux:
conda install -c rdkit/label/beta rdkit rdkit=2022.03

Please try out the beta and let us know if you find any problems!

Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] problem saving rdSubstructLibrary.

2022-03-12 Thread Greg Landrum
Hi Pat,

I don't think you're doing anything wrong. This looks like a bug in the
RDKit.
It seems to be connected to the PatternHolder... I will  look into it.

-greg


On Sat, Mar 12, 2022 at 10:26 PM Patrick Walters 
wrote:

> Hi All,
>
> I'd appreciate any insight on what I'm doing wrong.  I'm trying to save an
> rdSubstructLibrary. with library.toStream().  When library is empty I can
> save the library with library.toStream(), however when I've added molecule
> to the library, I get this error message.
>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 121:
> invalid continuation byte
>
> Example code below.  Any suggestions would be appreciated.
>
> Thanks,
>
> Pat
>
> #!/usr/bin/env python
>
> import sys
> from rdkit import Chem
> from rdkit.Chem import rdSubstructLibrary
>
> smiles_list = ["C","CC","CCC","","C"]
> mol_list = [Chem.MolFromSmiles(x) for x in smiles_list]
> library =
> rdSubstructLibrary.SubstructLibrary(rdSubstructLibrary.CachedSmilesMolHolder(),
>
> rdSubstructLibrary.PatternHolder())
> # Error when molecules are added
> # If the two lines below are commented, everything works
> for mol in mol_list:
> library.AddMol(mol)
> # -
> with open("out.sslib","w") as f:
> library.ToStream(f)
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to get the electronegativity of an atom?

2022-03-09 Thread Greg Landrum
Hi Francois,

Though there are various tables with electronegativity values in the code,
they are all for particular purposes (i.e. the MMFF force field), not
directly documented, and not really intended to be used elsewhere.

I think your idea of parsing the data from the BODR is the best approach

-greg


On Tue, Mar 8, 2022 at 9:42 AM Francois Berenger  wrote:

> On 08/03/2022 17:23, Francois Berenger wrote:
> > Dear rdkit experts,
> >
> > I am looking to access the electronegativity value
> > of a given atom in a molecule.
> >
> > Funnily, I don't know _at_ _all_ how to do this.
> >
> > I guess that there should be a way using the atomic number
> > to get this value from a table inside of rdkit
> > but my code searches on github where somewhat vain (we do have a
> > constant table
> > somewhere with the Allen scale values * 1000, but that
> > doesn't look very practical to use).
> >
> > Before I do the horrible (hard-coding into my program
> > the whole Allen scale copy-pasted from wikipedia, yes, nothing less),
> > I prefer to ask rdkit experts.
>
> A less horrible option might be to extract the
> bo:electronegativityPauling
> values from the elements.xml file of the latest version of the Blue
> Obelisk project.
>
> But, I would be quite surprised if rdkit doesn't hold this value already
> somewhere...
>
> > If we have a choice between the Pauling scale or the Allen
> > scale, I would be interested to know about that.
> > If we can directly access the difference of ElNeg for two
> > bonded atoms of a molecule, I might be happy living with that.
> >
> > Thanks a lot,
> > F.
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Find structures with "non-organic" atoms

2022-03-06 Thread Greg Landrum
One minor refinement... Pat's answer can be made a bit more efficient by
replacing the atom types which appear in both aromatic and aliphatic forms
with the corresponding atom number queries and by moving the !#1 to the
end. This should produce the same results as Pat's SMARTS:
not_organic_pat =
Chem.MolFromSmarts("[!#6;!#8;!#7;!#16;!#15;!F;!Cl;!Br;!I;!Na;!K;!Mg;!Ca;!Li;!#1]")

-greg


On Sun, Mar 6, 2022 at 3:19 AM Patrick Walters  wrote:

> Here's what I use.
>
> not_organic_pat =
> Chem.MolFromSmarts("[!#1;!C;!O;!N;!S;!P;!F;!Cl;!Br;!I;!c;!o;!n;!s;!p;!Na;!K;!Mg;!Ca;!Li]")
> cisplatin = Chem.MolFromSmiles("[NH3+]-[Pt-2](Cl)(Cl)[NH3+]")
> cisplatin.HasSubstructMatch(not_organic_pat)
>
>
>
> On Sat, Mar 5, 2022 at 8:08 PM Rafael L via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> Dear all, I remember having used some SMARTS-based function to flag
>> structures containing "non-organic" atoms. It seems that there is a knime
>> node for that (
>> https://forum.knime.com/t/rdkit-molecule-substructure-filter-incorrectly-matches-aromatic-sulfur-atoms-molecule-as-metal-containing-compounds/12935),
>> but I wasn't able to find any Python implementations. Does anyone know
>> where to find it?
>> Thanks in advance
>>
>> --
>> *Rafael da Fonseca Lameiro*
>> PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED)
>> São Carlos Institute of Chemistry - University of São Paulo - Brazil
>> [image: orcid logo 16px] https://orcid.org/-0003-4466-2682
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 2022 RDKit UGM registration open

2022-03-03 Thread Greg Landrum
Dear all,

This year's RDKit UGM will take place in Berlin, Germany from 12-14
October. The organizers are the Machine Learning Research group from Bayer
Pharma.

We're planning the UGM as an in-person event, but we will, assuming the
technology works, also live stream the sessions.

Free registration for both in-person and virtual attendance is open here:
https://www.eventbrite.com/e/11th-rdkit-ugm-2022-tickets-289161448677

If you're interested in contributing a talk, poster, lightning talk,
tutorial, etc. to the UGM, please submit your ideas here:
https://forms.gle/J2eHVkNjh4ngg1e76

We're really looking forward to another great UGM and to getting to see
some of you in Berlin!

Best regards,
Greg and the rest of the organizers
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] specifying deprotection

2022-02-20 Thread Greg Landrum
Hi Charmaine,

Sorry for the slow reply.

On Wed, Feb 16, 2022 at 4:04 PM Charmaine Siu Man Chu <
charmaine@liverpoolchirochem.com> wrote:

>
>
> I’m currently looking at the deprotection function within
> Chem.rdDeprotect. Is it possible to specify a specific deprotection from
> the in-built deprotection list and if so how can I go about doing that?
>

It is possible. The Deprotect function takes an optional second argument
with a list of deprotections to use.

Here's a simple example which collects the set of deprotections which apply
to amines and then applies them to a molecule:

only_amines = [x for x in rdDeprotect.GetDeprotections() if
x.deprotection_class=='amine']
deprotected_mol = rdDeprotect.Deprotect(mol,only_amines)


And here's the result of that:
[image: image.png]


Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] molecule layout to optimise available space

2022-02-11 Thread Greg Landrum
Oh, you want the layout itself changed, not just the orientation.

No, there's nothing in place to do that and adding such a thing would be
extremely non-trivial.

-greg


On Fri, Feb 11, 2022 at 3:49 PM Tim Dudgeon  wrote:

> Hi Greg,
> yes, but my situation is that the X dimension is much larger than the Y
> and most of the time things are aligned nicely. But not always. Here is an
> example.
> OC(C(=O)NC=1C=CC=CC1NS(=O)(=O)C=2C=CC(F)=CC2)C=3C=CC=NC3
> [image: image.png]
> Clearly there is potential to lay this out using more of the X dimension
> and less of the Y.
>
> Tim
>
> On Fri, Feb 11, 2022 at 1:57 PM Greg Landrum 
> wrote:
>
>> Hi Tim,
>>
>> That's a nice one.
>>
>> For people not familiar with the problem:
>> The RDKit coordinate generation prefers aligning molecules with the X
>> axis; this can lead to "sub-optimal" drawings if your drawing canvas is
>> taller than it is wide.
>>
>> One easy solution is to just generate coordinates as usual and then
>> rotate them to favor the Y axis if your canvas is larger along Y.
>> Here's a gist showing how to do that:
>> https://gist.github.com/greglandrum/12b793b240d27e3c0899c9c6c62d4f30
>>
>> -greg
>>
>>
>> On Fri, Feb 11, 2022 at 10:20 AM Tim Dudgeon 
>> wrote:
>>
>>> At Dave Cosgrove's suggestion I raise this as a new topic, though it was
>>> touched on briefly recently.
>>>
>>> I'd like to know if it's possible to depict a molecule in a way that
>>> takes into account the dimensions of the box it will appear in. In my case
>>> I have a rectangle that is short and wide (aspect ratio of 1:3) and the
>>> molecules are typically compressed because of the lack of available height.
>>> So is it possible to make the layout engine aware of the bounds it has
>>> available (e.g. in my example short and wide)?
>>>
>>> Thanks
>>> Tim
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit and GSoC 2022

2022-02-08 Thread Greg Landrum
Dear all,

This year it is once again possible to do longer projects as part of Google
Summer of Code, so I think it makes sense for the RDKit to participate
again.

If you have ideas for projects which can be accomplished with about 350
hours of effort (~30 hours/week for 12 weeks) and are willing to mentor the
person working on the project (note that this no longer has to be a
student), please let me know!

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] double bond in a ring

2022-01-27 Thread Greg Landrum
Hi Shani,

This is a limitation of the RDKit's 2D coordinate generation code.
The workaround is to use the RDKit's coordgen integration to generate 2D
coordinates. You can toggle this on by doing:

from rdkit.Chem import rdDepictor
rdDepictor.SetPreferCoordGen(True)

That will, in general, give nicer 2D coordinates anyway.

Best regards,
-greg



On Thu, Jan 27, 2022 at 10:37 AM Shani Zev  wrote:

> Hi everyone,
> I'm trying to create coordinates for a molecule containing a TRANS double
> bond in a ring. However, while I try to create coordinate from the mol
> object, the structure that was created is CIS and not TRANS.
> I check both options (cis and trans) and check that bond
> specificity correctly using FindPotentialStereo, then I use MolToMolBlock
> in order to create coordinate, and both structures (cis and trans) are
> created as CIS.
> any ideas/suggestions?
>
> thanks in advance,
> Shani
>
> For example my code:
>
> import rdkit
> from rdkit import Chem
> from rdkit.Chem import AllChem
> print(rdkit.__version__)
>
> smiles = {'Cis':'C/C1=C\1', 'Trans':'C/C1=C/1'}
> for key in smiles:
> mol = Chem.MolFromSmiles(smiles[key])
> mol = Chem.AddHs(mol)
> AllChem.EmbedMolecule(mol)
> si = Chem.FindPotentialStereo(mol)
> for element in si:
> print(f' {key}  Type: {element.type}, Descriptor: 
> {element.descriptor} ')
> print(Chem.MolToMolBlock(mol), file=open(str(key)+'.mol', 'w+'))
>
> *the output: *
>
> 2021.09.4
>  Cis  Type: Bond_Double, Descriptor: Bond_Cis
>  Trans  Type: Bond_Double, Descriptor: Bond_Trans
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using SmilesMolSuplier with CSV containing quotemarks

2022-01-10 Thread Greg Landrum
Hi James,

The RDKit does not have a full-featured CSV parser, writing such a thing is
a non-trivial task. If you need to support general CSV, I'd suggest using
pandas or python's builtin csv module... it seems like overkill, but
dealing with all the oddness that can show up in CSVs is really not easy.

Best,
-greg


On Mon, Jan 10, 2022 at 11:15 AM James Wallace  wrote:

> As the subject suggests, I'm trying to find a universal solution for
> reading CSVs via the SmilesMolSupplier (as the input setup could be single
> column or multiple column, using the pandas tools for interconversion is
> overkill)
>
> The general structure I use for analysing the CSV is:
>
>
> with open(chem_file_name, "r") as csv_upload_file:
> first_line = csv_upload_file.readline()
> dialect = sniffer.sniff(first_line)
> has_header = sniffer.has_header(first_line)
> csv_upload_file.close()
>
> supplier = Chem.SmilesMolSupplier(chem_file_name,
> delimiter=str(dialect.delimiter), smilesColumn=smi_col_header,
> nameColumn=-1, titleLine=has_header)
>
> If I use a CSV without quoted data,, this is fine, I can autodetect the
> delimiter, the column header is loaded in by the rest of my workflow,
> everything else is worked out through the CSV sniffer. However, where it is
> quoted data, the actual parsing will fail because of the quotemarks.
>
> [10:09:56] SMILES Parse Error: syntax error for input: '"C1=CC=CC=C1"'
> [10:09:56] ERROR: Smiles parse error on line 1
>
> Is there some easy way of handling this, or do I have to mandate not using
> quoting of data in the CSV generation?
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problems building from source

2021-12-21 Thread Greg Landrum
Glad to hear it.
And it's helpful if these replies also go to the mailing list so that
others can also see that this was the answer to the question.

Thanks,
-greg


On Tue, Dec 21, 2021 at 10:09 AM Tim Dudgeon  wrote:

> Hi Greg,
> Yes, that seems to have done the trick.
> Thanks!
> Tim
>
> On Tue, Dec 21, 2021 at 6:24 AM Greg Landrum 
> wrote:
>
>> Hi Tim,
>>
>> You probably need to provide the argument "-DBoost_NO_BOOST_CMAKE=TRUE"
>> to cmake.
>>
>> -greg
>>
>>
>> On Mon, Dec 20, 2021 at 6:20 PM Tim Dudgeon 
>> wrote:
>>
>>> I'm hitting problems when building from source on an Ubuntu system.
>>> I've installed the dependencies (e.g. boost) using apt and think I have
>>> everything set properly, but cmake is failing. Early on it says:
>>>
>>> -- Found Boost:
>>> /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found
>>> suitable version "1.71.0", minimum required is "1.56.0") found components:
>>> serialization
>>> == Using strict rotor definition
>>> -- Found Boost:
>>> /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found
>>> suitable version "1.71.0", minimum required is "1.56.0") found components:
>>> system iostreams
>>>
>>> Then a little later it spits out a large number of statements like this
>>>
>>> CMake Error at Code/cmake/Modules/RDKitUtils.cmake:153 (add_executable):
>>>   Target "testCoordGen" links to target "Boost::system" but the target
>>> was
>>>   not found.  Perhaps a find_package() call is missing for an IMPORTED
>>>   target, or an ALIAS target is missing?
>>> Call Stack (most recent call first):
>>>   External/CoordGen/CMakeLists.txt:110 (rdkit_test)
>>>
>>> These are the boost libs I have:
>>>
>>> apt list --installed | grep libboost
>>>
>>> WARNING: apt does not have a stable CLI interface. Use with caution in
>>> scripts.
>>>
>>> libboost-atomic1.71-dev/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-atomic1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-chrono1.71-dev/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-chrono1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-date-time1.71-dev/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-date-time1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-dev/focal,now 1.71.0.0ubuntu2 amd64 [installed]
>>> libboost-iostreams1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
>>> libboost-iostreams1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-python1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
>>> libboost-python1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-regex1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
>>> libboost-regex1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-serialization1.71-dev/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed]
>>> libboost-serialization1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-system1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
>>> libboost-system1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost-thread1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
>>> libboost-thread1.71.0/focal,now 1.71.0-6ubuntu6 amd64
>>> [installed,automatic]
>>> libboost1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
>>>
>>> Any idea what is wrong?
>>>
>>> Tim
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problems building from source

2021-12-20 Thread Greg Landrum
Hi Tim,

You probably need to provide the argument "-DBoost_NO_BOOST_CMAKE=TRUE" to
cmake.

-greg


On Mon, Dec 20, 2021 at 6:20 PM Tim Dudgeon  wrote:

> I'm hitting problems when building from source on an Ubuntu system.
> I've installed the dependencies (e.g. boost) using apt and think I have
> everything set properly, but cmake is failing. Early on it says:
>
> -- Found Boost:
> /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found
> suitable version "1.71.0", minimum required is "1.56.0") found components:
> serialization
> == Using strict rotor definition
> -- Found Boost:
> /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found
> suitable version "1.71.0", minimum required is "1.56.0") found components:
> system iostreams
>
> Then a little later it spits out a large number of statements like this
>
> CMake Error at Code/cmake/Modules/RDKitUtils.cmake:153 (add_executable):
>   Target "testCoordGen" links to target "Boost::system" but the target was
>   not found.  Perhaps a find_package() call is missing for an IMPORTED
>   target, or an ALIAS target is missing?
> Call Stack (most recent call first):
>   External/CoordGen/CMakeLists.txt:110 (rdkit_test)
>
> These are the boost libs I have:
>
> apt list --installed | grep libboost
>
> WARNING: apt does not have a stable CLI interface. Use with caution in
> scripts.
>
> libboost-atomic1.71-dev/focal,now 1.71.0-6ubuntu6 amd64
> [installed,automatic]
> libboost-atomic1.71.0/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
> libboost-chrono1.71-dev/focal,now 1.71.0-6ubuntu6 amd64
> [installed,automatic]
> libboost-chrono1.71.0/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
> libboost-date-time1.71-dev/focal,now 1.71.0-6ubuntu6 amd64
> [installed,automatic]
> libboost-date-time1.71.0/focal,now 1.71.0-6ubuntu6 amd64
> [installed,automatic]
> libboost-dev/focal,now 1.71.0.0ubuntu2 amd64 [installed]
> libboost-iostreams1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
> libboost-iostreams1.71.0/focal,now 1.71.0-6ubuntu6 amd64
> [installed,automatic]
> libboost-python1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
> libboost-python1.71.0/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
> libboost-regex1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
> libboost-regex1.71.0/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
> libboost-serialization1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
> libboost-serialization1.71.0/focal,now 1.71.0-6ubuntu6 amd64
> [installed,automatic]
> libboost-system1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
> libboost-system1.71.0/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
> libboost-thread1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed]
> libboost-thread1.71.0/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
> libboost1.71-dev/focal,now 1.71.0-6ubuntu6 amd64 [installed,automatic]
>
> Any idea what is wrong?
>
> Tim
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Query on a failed molecule from SureChEMBL

2021-12-14 Thread Greg Landrum
Hi Lewis,

Dealing with all the strange chemical representations that show up "in the
wild" is an ongoing struggle.

Your first example is pretty clearly intended to be an azide and we can
certainly add a rule to normalize that one to what the RDKit expects it to
be (there already is a rule for C-N=N#N, but that doesn't help here.). That
won't happen before the next feature release though.

I'm not really sure what the intent was for the two
four-coordinate neutral Ns in the second molecule, so I think it's unlikely
that we'd add a standard cleanup for one.

However! The good news is that there's a pretty easy (and efficient) way to
fix this yourself. We added a new method to chemical reactions in the
2021.09 release which allows you to modify a molecule in place (subject to
some constraints). This is ideal for doing cleanup transformations like
these.

This gist shows how to write reaction rules for your cases (I guessed for
what the Ns are supposed to be) and then use them:
https://gist.github.com/greglandrum/8fd229bc6bf6c734d1c21da7f2bebebb

Hope this helps,
-greg


On Wed, Dec 15, 2021 at 12:21 AM Lewis Martin 
wrote:

> Hi All,
> Reading molecules from a bulk download of SureChEMBL, I come across a fair
> few molecules that fail to parse. Not sure whether they SHOULD parse or
> not.
>
> Here is an example: https://www.surechembl.org/chemical/SCHEMBL386
> with SMILES code: COC(=O)C1=C(C=CC=C1)C1=CC=C(C[N+]#[N]=[N-])C=C1
>
> Even reading the SMILES code one can see that there are too many bonds in
> there - a nitrogen triply bonded and doubly bonded to other atoms.
>
> Another example: https://www.surechembl.org/chemical/SCHEMBL33957
> smiles: NC(N)=[NH]C1=NC(CSCC[NH]=CNS(=O)(=O)C2=CC=C(Br)C=C2)=CS1
>
> Again, valence for a nitrogen is off.
>
> Should I expect to parse these with RDKit? Might there be some way around
> this? It's a significant fraction of the molecules in SureChEMBL.
>
> Thanks team!
> Lewis
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] invalid CTAB substructure query with PostgreSQL cartridge

2021-12-10 Thread Greg Landrum
Hi Susan,

I haven't looked at the Sgroup or unsaturation flags yet (I will try to do
this later today), but a word on the aromaticity/kekulization.

One of the things which is going wrong here at the cartridge level is that
qmol_from_ctab() is sanitizing the molecules it reads in. This is not
correct. qmol_from_ctab is intended to produce a query and queries should
not be sanitized. I will get that fixed and, assuming it doesn't turn up
other problems, we should be able to get that into the next patch release.
Here's that bug report:
https://github.com/rdkit/rdkit/issues/4787

There's another potential issue with how bonds from CTABs are parsed,
inspired by your first message, which we're discussing here:
https://github.com/rdkit/rdkit/issues/4785

Thanks for the very detailed descriptions of the problem!
-greg



On Fri, Dec 10, 2021 at 11:11 AM Susan Leung  wrote:

> Hi Paolo,
>
>
> Thanks very much for filing the bug and for offering the Python
> preprocessing solution.
>
>
> I actually have a few more CTABs that are not valid. One of which raises a
> kekulisation error and like the previous non-ring aromaton atom problem,
> there is an alternative SMARTS query that can be written. I suspect that
> you might suggest some Python preprocessing for this and converting to
> SMARTS?
>
>
> The other two errors, I don’t think are to do with sanitization but
> possibly due to the way the CTAB is read. One has multiple atoms with the
> unsaturated flag turned on. The other has a SGROUP defined.
>
> Please see below, where I tried to summarize and I again attach a ipynb if
> it helps.
>
>
> Thanks very much,
>
>
> Susan
>
>
> Example 2: Kekulization
>
> I want a query CTAB to match the following two tautomers.
>
> sm1 = 'Cc1c[nH]nc1C'
>
> sm2 = 'Cc1cn[nH]c1C'
>
>
>
> I would like to use the following CTAB as the query but it is not valid ,
> I’m guessing it’s because of kekulization, which is the error produced when
> doing MolFromMolBlock:
>
> ctab = """
>
>   ACCLDraw12082111532D
>
>
>
>   7  7  0  0  0  0  0  0  0  0999 V2000
>
> 9.2840  -12.13440. C   0  0  3  0  0  0  0  0  0  0  0  0
>
>10.2367  -11.44220. C   0  0  3  0  0  0  0  0  0  0  0  0
>
> 9.8729  -10.32170. N   0  0  0  0  0  0  0  0  0  0  0  0
>
> 8.6950  -10.32170. N   0  0  0  0  0  0  0  0  0  0  0  0
>
> 8.3309  -11.44220. C   0  0  3  0  0  0  0  0  0  0  0  0
>
> 7.1932  -11.74710. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 9.2840  -13.31220. C   0  0  0  0  0  0  0  0  0  0  0  0
>
>   2  1  4  0  0  0  0
>
>   2  3  4  0  0  0  0
>
>   3  4  4  0  0  0  0
>
>   4  5  4  0  0  0  0
>
>   5  1  4  0  0  0  0
>
>   5  6  1  0  0  0  0
>
>   7  1  1  0  0  0  0
>
> M  END
>
> """
>
> select is_valid_ctab('{ctab}')
>
>
>
> Returns False
>
> I can make an alternative valid CTAB with a hydrogen on one of the
> nitrogens that is valid, but then it doesn’t match both sm1 and sm2.
>
> ctab_fixed = """
>
>   ACCLDraw12082111272D
>
>
>
>   8  8  0  0  0  0  0  0  0  0999 V2000
>
>11.6590  -11.94690. C   0  0  3  0  0  0  0  0  0  0  0  0
>
>12.6117  -11.25470. C   0  0  3  0  0  0  0  0  0  0  0  0
>
>12.2479  -10.13420. N   0  0  3  0  0  0  0  0  0  0  0  0
>
>11.0700  -10.13420. N   0  0  0  0  0  0  0  0  0  0  0  0
>
>10.7059  -11.25470. C   0  0  3  0  0  0  0  0  0  0  0  0
>
> 9.5682  -11.55960. C   0  0  0  0  0  0  0  0  0  0  0  0
>
>11.6590  -13.12470. C   0  0  0  0  0  0  0  0  0  0  0  0
>
>12.8368   -9.11420. H   0  0  0  0  0  0  0  0  0  0  0  0
>
>   2  1  4  0  0  0  0
>
>   2  3  4  0  0  0  0
>
>   3  4  4  0  0  0  0
>
>   4  5  4  0  0  0  0
>
>   5  1  4  0  0  0  0
>
>   5  6  1  0  0  0  0
>
>   7  1  1  0  0  0  0
>
>   3  8  1  0  0  0  0
>
> M  END
>
> """
>
>
>
> select mol_from_smiles('{sm1}') @> qmol_from_ctab('{ctab_fixed}
>
>
>
> Returns True
>
> select mol_from_smiles('{sm2}') @> qmol_from_ctab('{ctab_fixed}')
>
>
>
> Returns False
>
>
>
> However, I can make a qmol from SMARTS that can match with both:
>
> alt_smarts = '[#6]1(:[#6]:[#7]:[#7]:[#6]:1-[#6])-[#6]'
>
>
>
> select mol_from_smiles('{sm1}') @> qmol_from_smarts('{alt_smarts}')
>
>
>
>  select mol_from_smiles('{sm2}') @> qmol_from_smarts('{alt_smarts}'
>
>
>
> Return True for both.
>
> ___
>
> Example 3 Unsaturated :
>
> How does RDKit handle the M  UNS line? It can’t seem to handle this CTAB
> for example, where multiple atoms (atom 8 and atom 9) have the unsaturated
> flag on.
>
> ctab_og = """
>
>   ACCLDraw12082113482D
>
>
>
>   9  9  0  0  0  0  0  0  0  0999 V2000
>
> 2.6030  -22.36750. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 3.6258  -21.77730. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 4.6447  -22.36690. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 4.6447  -23.54800. C   0  0  0  0  0  0  0  0  0  0  0  0
>
> 

Re: [Rdkit-discuss] Using EnumerateLibraryFromReaction without fragmenting reactants

2021-12-05 Thread Greg Landrum
Unless you can make your reactant definitions a lot more specific, I think
you're going to have to do the post-processing.
The RDKit just uses all possible substructure matches of the reactant
templates you provide... it makes no attempt to determine which is "best".

-greg


On Sun, Dec 5, 2021 at 8:24 PM James Wallace  wrote:

> Here's the specific example that I was referring to:
>
> I used a reaction with this SMILES, making a ChemicalReaction out of it
> using the add reactant templates:
>
> [image: image.png]
> N[CH3:1].O=C(O)C(c1c1)C11>>O=C(N[CH3:1])C(c1c1)C11
>
> Taking two reagents to fit it:
> [image: image.png]
> OC(=O)C(C11)c1c1
>
> [image: image.png]
> NC(=O)Nc1ccc(C(=O)O)cc1
>
> Gives us products
>
> [image: image.png]
> O=C(NC(=O)C(c1c1)C11)Nc1ccc(C(=O)O)cc1
>
> [image: image.png]
> NC(=O)NC(=O)C(c1c1)C11
>
> [image: image.png]
> O=C(O)c1ccc(NC(=O)C(c2c2)C22)cc1
>
> In my case, I want to just see the result that uses the most of the
> reagent in each case, and not the ones where the reagent is fragmented.
>
> Is that possible directly, or do I need to filter my results post hoc?
>
> On Thu, 2 Dec 2021 at 20:04, Paolo Tosco 
> wrote:
>
>> Hi James,
>>
>> I am not quite sure I understand what you have done and what you'd like
>> to achieve.
>> Ideally, could you please post:
>>
>> * the reaction you are using
>> * some example reactants
>> * the desired product(s)
>> * the undesired product(s)
>>
>> Thanks, cheers
>> p.
>>
>> On Thu, Dec 2, 2021 at 6:03 PM James Wallace 
>> wrote:
>>
>>> Hi,
>>> I've been working with the EnumerateLibraryFromReaction function to
>>> generate some quick access molecule libraries, using standard lists of
>>> reagents.
>>>
>>> However, when I get the output back, I notice that I get results that
>>> chop up the reactants I use wherever they match the reaction rule, so for a
>>> rule looking for NH I would effectively get the yellow ringed area here
>>> used as a structure as well as the desired red one.
>>> [image: image.png]
>>> As well, since this contains acid character, it would also match there,
>>> but I accept that I can't necessarily remove that.
>>>
>>> Is there any way I can filter out the partial usages, or am I stuck with
>>> using the substructures as well?
>>>
>>> Thanks,
>>>
>>> James
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Trouble Running Rdkit Docker Locally

2021-11-30 Thread Greg Landrum
Hi Jessica,

That Dockerfile isn't written to be used interactively, it's just set up to
build the wrappers and then copy the resulting build artifacts + the demo
files to the building machine.
So if I run it like this:

DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile --build-arg
RDKIT_BRANCH=master -o /tmp/rdkit_tmp .


I end up with the output in my /tmp/rdkit_tmp directory (which needs to
exist before I run the docker build command):

/scratch/RDKit_git/Code/MinimalLib$ ls -l /tmp/rdkit_tmp/
total 4852


-rw-r--r-- 1 glandrum glandrum7077 Nov 30 09:10 demo.html
-rw-r--r-- 1 glandrum glandrum   10380 Nov 30 09:10 GettingStartedInJS.html
-rw-r--r-- 1 glandrum glandrum  143844 Nov 30 09:15 RDKit_minimal.js
-rwxr-xr-x 1 glandrum glandrum 4798348 Nov 30 09:15 RDKit_minimal.wasm


Apologies that this isn't documented anywhere.
-greg


On Tue, Nov 30, 2021 at 8:04 AM Jessica Heston 
wrote:

> Hi,
>
> I'm trying to run the dockerfile of the minimalLib folder. I successfully
> created an image locally.  But when I try to run it using this command,
> docker run -d -p 80:80 rdkitjavascript,  I get "docker: Error response from
> daemon: No command specified".  Can you help figure out how to fix this?
>
> Thanks,
> Jessica
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Issue building RDKit from source with Conda

2021-10-22 Thread Greg Landrum
Excellent, I'm glad everything worked.

I'll make a note that we need to update those instructions to make this
easier.

-greg


On Fri, Oct 22, 2021 at 9:51 AM Gonzalo Colmenarejo <
colmenarejo.gonz...@gmail.com> wrote:

> Hi Greg et al,
>
> I finally passed all the test after defining export
> LD_LIBRARY_PATH=$RDBASE/lib (this fixed all the tests but
> pythonTestDirChem; in turn, this was fixed by installing pandas).
>
> Thanks for all your help
>
> Gonzalo
>
> On Thu, Oct 21, 2021 at 5:28 PM Gonzalo Colmenarejo <
> colmenarejo.gonz...@gmail.com> wrote:
>
>> Hi Greg,
>>
>> after setting RDBASE and PYTHONPATH I get a much reduced set of errors
>> with ctest, but still some test fail. In all the cases, the output on
>> failure is like this:
>>
>> Traceback (most recent call last):
>>   File "/home/gonzalo/rdkit/Code/GraphMol/Depictor/Wrap/testDepictor.py",
>> line 12, in 
>> from rdkit import Chem
>>   File "/home/gonzalo/rdkit/rdkit/__init__.py", line 2, in 
>> from . import rdBase
>> ImportError: /home/gonzalo/rdkit/rdkit/rdBase.so: undefined symbol:
>> _ZN5RDLog9BlockLogsC1Ev
>>
>> And if I run ldd rdkit/rdBase.so I get:
>>
>> linux-vdso.so.1 (0x7ffc4f922000)
>> libgtk3-nocsd.so.0 => /usr/lib/x86_64-linux-gnu/libgtk3-nocsd.so.0
>> (0x7f1c495a8000)
>> libRDKitRDBoost.so.1 =>
>> /home/gonzalo/rdkit/build/lib/libRDKitRDBoost.so.1 (0x7f1c4959c000)
>> libRDKitRDGeneral.so.1 =>
>> /home/gonzalo/rdkit/build/lib/libRDKitRDGeneral.so.1 (0x7f1c4957a000)
>> libboost_python38.so.1.73.0 =>
>> /home/gonzalo/anaconda3/envs/rdksc/lib/libboost_python38.so.1.73.0
>> (0x7f1c4953c000)
>> libstdc++.so.6 => /home/gonzalo/anaconda3/envs/rdksc/lib/libstdc++.so.6
>> (0x7f1c49391000)
>> libgcc_s.so.1 => /home/gonzalo/anaconda3/envs/rdksc/lib/libgcc_s.so.1
>> (0x7f1c4937a000)
>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f1c49188000)
>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f1c49182000)
>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>> (0x7f1c4915f000)
>> /lib64/ld-linux-x86-64.so.2 (0x7f1c498af000)
>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f1c49154000)
>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x7f1c4914f000)
>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f1c48ffe000)
>>
>> Do you know how can this be fixed?
>>
>> Thanks a lot
>>
>> Gonzalo
>>
>> On Tue, Oct 19, 2021 at 9:42 AM Gonzalo Colmenarejo <
>> colmenarejo.gonz...@gmail.com> wrote:
>>
>>> Thanks Greg.
>>>
>>> What then should I use as $RDBASE? The path for the rdkit directory
>>> created after the git clone?
>>>
>>> Thanks a lot
>>>
>>> Gonzalo
>>>
>>> On Tue, Oct 19, 2021 at 6:53 AM Greg Landrum 
>>> wrote:
>>>
>>>> Hi Gonzalo,
>>>>
>>>> These failures look like this:
>>>>
>>>>   2/198 Test   #2: pyCoordGen .***Failed
>>>>  0.04 sec
>>>> Traceback (most recent call last):
>>>>   File "/home/gonzalo/rdkit/External/CoordGen/Wrap/testCoordGen.py",
>>>> line 13, in 
>>>> from rdkit.Chem import rdCoordGen, rdMolAlign
>>>> ModuleNotFoundError: No module named 'rdkit'
>>>>
>>>>
>>>> That's an indication that you don't have your PYTHONPATH set correctly.
>>>> It should include the $RDBASE directory (make sure that RDBASE is also set
>>>> correctly).
>>>>
>>>> The documentation doesn't include this... we'll fix that.
>>>>
>>>> -greg
>>>>
>>>>
>>>>
>>>> On Mon, Oct 18, 2021 at 4:53 PM Gonzalo Colmenarejo <
>>>> colmenarejo.gonz...@gmail.com> wrote:
>>>>
>>>>> Hi Greg et al.,
>>>>>
>>>>> Please find attached the results of ctest --output-on-failure and
>>>>> cmake. I followed the instructions in
>>>>> https://www.rdkit.org/docs/Install.html, section "How to install from
>>>>> source with Conda/Linux x86_64: Python 3 environment".
>>>>>
>>>>> Thanks a lot
>>>>>
>>>>> Gonzalo
>>>>>
>>>>> On Sat, Oct 9, 2021 at 3:55 PM Greg Landrum 
>>>>> wrote:
>>>>>
>>>>>> Hi Gonza

Re: [Rdkit-discuss] Issue building RDKit from source with Conda

2021-10-18 Thread Greg Landrum
Hi Gonzalo,

These failures look like this:

  2/198 Test   #2: pyCoordGen .***Failed
 0.04 sec
Traceback (most recent call last):
  File "/home/gonzalo/rdkit/External/CoordGen/Wrap/testCoordGen.py", line
13, in 
from rdkit.Chem import rdCoordGen, rdMolAlign
ModuleNotFoundError: No module named 'rdkit'


That's an indication that you don't have your PYTHONPATH set correctly. It
should include the $RDBASE directory (make sure that RDBASE is also set
correctly).

The documentation doesn't include this... we'll fix that.

-greg



On Mon, Oct 18, 2021 at 4:53 PM Gonzalo Colmenarejo <
colmenarejo.gonz...@gmail.com> wrote:

> Hi Greg et al.,
>
> Please find attached the results of ctest --output-on-failure and cmake. I
> followed the instructions in https://www.rdkit.org/docs/Install.html,
> section "How to install from source with Conda/Linux x86_64: Python 3
> environment".
>
> Thanks a lot
>
> Gonzalo
>
> On Sat, Oct 9, 2021 at 3:55 PM Greg Landrum 
> wrote:
>
>> Hi Gonzalo,
>>
>> The message you show below is just a warning, not an actual error.
>> Do you get actual compilation errors? If so please share them.
>>
>> Try running the tests with:
>> ctest --output-on-failure
>> and sharing the error messages you see.
>>
>> Best,
>> -greg
>>
>>
>> On Fri, Oct 8, 2021 at 1:49 PM Gonzalo Colmenarejo <
>> colmenarejo.gonz...@gmail.com> wrote:
>>
>>> Hi,
>>> I'm having issues trying to build RDKt from source with Conda using the
>>> recipe in the RDKit web page. The build is apparently complete but the
>>> ctest only achieves 35% of passed tests. I'm using an Ubuntu 20
>>> workstation.
>>>
>>> I first generate a Conda environment with all the required stuff:
>>>
>>> conda create --name rdksc python==3.8.1 cmake cairo pillow eigen
>>> pkg-config boost boost-cpp py-boost gxx_linux-64 numpy
>>>
>>> After cloning the git repository then I run cmake (following the
>>> instructions):
>>>
>>> cmake -DPy_ENABLE_SHARED=1 -DRDK_INSTALL_INTREE=ON
>>> -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON
>>> -DPYTHON_NUMPY_INCLUDE_PATH="$(python -c 'import numpy ;
>>> print(numpy.get_include())')" -DBOOST_ROOT="$CONDA_PREFIX" ..
>>>
>>> Finally I run make and make install, and the build is complete but with
>>> a series of messages. The ctest gives only a 35% of test passed.
>>>
>>> The messages I get in compilation are like these:
>>>
>>> In file included from
>>> /home/gonzalo/anaconda3/envs/rdksc/include/boost/bind.hpp:30,
>>>  from
>>> /home/gonzalo/anaconda3/envs/rdksc/include/boost/python/exception_translator.hpp:10,
>>>  from
>>> /home/gonzalo/anaconda3/envs/rdksc/include/boost/python.hpp:28,
>>>  from /home/gonzalo/rdkit/Code/RDBoost/python.h:3,
>>>  from
>>> /home/gonzalo/rdkit/Code/ChemicalFeatures/Wrap/FreeChemicalFeature.cpp:12:
>>> /home/gonzalo/anaconda3/envs/rdksc/include/boost/bind.hpp:36:1: note:
>>> #pragma message: The practice of declaring the Bind placeholders (_1, _2,
>>> ...) in the global namespace is deprecated. Please use
>>>  + using namespace boost::placeholders, or define
>>> BOOST_BIND_GLOBAL_PLACEHOLDERS to retain the current behavior.
>>>36 | BOOST_PRAGMA_MESSAGE(
>>>   | ^~~~
>>>
>>> I'd really acknowledge any help in getting this fixed and why is this
>>> message showing up.
>>>
>>> Thanks a lot in advance
>>>
>>> Gonzalo
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D Pharmacophore fingerprints slow or stuck

2021-10-14 Thread Greg Landrum
Hi Anthony,

Can you share the molecule that you're having trouble with?

As for the pharmacophore definitions: we don't have a great set of
definitions available, but the one I normally suggest would be to use
either the ones in RDConfig.RDDataDir+'/BaseFeatures.fdef'
or the feature factory which is available from Chem.Pharm2D.Gobbi_Pharm2D:

from rdkit.Chem.Pharm2D import Gobbi_Pharm2D

sigFactory = Gobbi_Pharm2D.factory
sigFactory.SetBins([(0,2),(2,5),(5,8)])
sigFactory.Init()



Tutorials: the only one I'm aware of is the material in the Getting Started
guide:
https://www.rdkit.org/docs/GettingStartedInPython.html#d-pharmacophore-fingerprints
but I suspect you've found that already

Best,
-greg


On Wed, Oct 13, 2021 at 10:39 PM Anthony Nash 
wrote:

> Dear all,
>
> I'm afraid that I'm struggling to understand precisely how to build a 2D
> pharmacophore fingerprint. I thought I had it, but my code sits in a
> running state indefinitely.
>
> This is the code:
>
> fdefNameStr: str = "MinimalFeatures.fdef"
> featFactory = ChemicalFeatures.BuildFeatureFactory(fdefNameStr)
> sigFactory = SigFactory(featFactory, minPointCount=2, maxPointCount=3,
> trianglePruneBins=False)
> sigFactory.SetBins([(0,2),(2,5),(5,8)])
> sigFactory.Init()
> sigFactory.GetSignature()
> pharmFPList = []
> for keyCase in caseDrugDictionary:
> caseDrug = caseDrugDictionary[keyCase]
> mol = caseDrug.getRDKitMol()
> drugNameStr = caseDrug.getDrugName()
> pharmFP = Generate.Gen2DFingerprint(mol, sigFactory)
> pharmFPList.append(pharmFP)
>
> Firstly, the code runs indefinitely (2 hrs and still running) when it
> executes: pharmFP = Generate.Gen2DFingerprint(mol, sigFactory)
>
> Secondly, I would be extremely grateful if someone could explain whether
> "MinimalFeatures.fdef" is the right file? I downloaded it from GitHub, but
> I'm uncertain of its use. I've gone through the RDKit API and RDKit book.
> Are there any descriptive RDKit Pharmacophore tutorials?
>
> Many thanks
> Anthony
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Issue building RDKit from source with Conda

2021-10-09 Thread Greg Landrum
Hi Gonzalo,

The message you show below is just a warning, not an actual error.
Do you get actual compilation errors? If so please share them.

Try running the tests with:
ctest --output-on-failure
and sharing the error messages you see.

Best,
-greg


On Fri, Oct 8, 2021 at 1:49 PM Gonzalo Colmenarejo <
colmenarejo.gonz...@gmail.com> wrote:

> Hi,
> I'm having issues trying to build RDKt from source with Conda using the
> recipe in the RDKit web page. The build is apparently complete but the
> ctest only achieves 35% of passed tests. I'm using an Ubuntu 20
> workstation.
>
> I first generate a Conda environment with all the required stuff:
>
> conda create --name rdksc python==3.8.1 cmake cairo pillow eigen
> pkg-config boost boost-cpp py-boost gxx_linux-64 numpy
>
> After cloning the git repository then I run cmake (following the
> instructions):
>
> cmake -DPy_ENABLE_SHARED=1 -DRDK_INSTALL_INTREE=ON
> -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON
> -DPYTHON_NUMPY_INCLUDE_PATH="$(python -c 'import numpy ;
> print(numpy.get_include())')" -DBOOST_ROOT="$CONDA_PREFIX" ..
>
> Finally I run make and make install, and the build is complete but with a
> series of messages. The ctest gives only a 35% of test passed.
>
> The messages I get in compilation are like these:
>
> In file included from
> /home/gonzalo/anaconda3/envs/rdksc/include/boost/bind.hpp:30,
>  from
> /home/gonzalo/anaconda3/envs/rdksc/include/boost/python/exception_translator.hpp:10,
>  from
> /home/gonzalo/anaconda3/envs/rdksc/include/boost/python.hpp:28,
>  from /home/gonzalo/rdkit/Code/RDBoost/python.h:3,
>  from
> /home/gonzalo/rdkit/Code/ChemicalFeatures/Wrap/FreeChemicalFeature.cpp:12:
> /home/gonzalo/anaconda3/envs/rdksc/include/boost/bind.hpp:36:1: note:
> #pragma message: The practice of declaring the Bind placeholders (_1, _2,
> ...) in the global namespace is deprecated. Please use
>  + using namespace boost::placeholders, or define
> BOOST_BIND_GLOBAL_PLACEHOLDERS to retain the current behavior.
>36 | BOOST_PRAGMA_MESSAGE(
>   | ^~~~
>
> I'd really acknowledge any help in getting this fixed and why is this
> message showing up.
>
> Thanks a lot in advance
>
> Gonzalo
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of 2021.09.1 release available.

2021-10-08 Thread Greg Landrum
Hi Eric,

No real reason: I had problems doing the automated build with python 3.9
this morning and didn't have time to investigate.
The conda-forge build process does do python 3.9 builds of the RDKit, so
there's no fundamental problem.

Note that I no longer do release builds on the rdkit channel. Dealing with
all the different operating systems and python versions had become a bunch
of work and the conda-forge versions work really well.

-greg


On Fri, Oct 8, 2021 at 4:06 PM Eric Jonas  wrote:

> Thanks Greg, this is incredible as always!
>
> Is there any impediment to also doing conda builds for python 3.9? more
> distros/environments use 3.9 now and it's still nice to be able to use the
> rdkit channel instead of having to go full conda-forge.
>
> If there's any technical reason (or if it's a lot of work) then don't
> worry about it, but I was hoping it might just be an extra build
> configuration option in your build scripts :)
>
> ...E
>
>
>
> On Fri, Oct 8, 2021 at 6:42 AM Greg Landrum 
> wrote:
>
>> Dear all,
>>
>> I tagged the first beta of the 2021.09 RDKit release this morning.
>> Assuming nothing weird shows up during testing, we'll do the actual release
>> on the 18th.
>>
>> You can find the new beta here:
>> https://github.com/rdkit/rdkit/releases/tag/Release_2021_09_1b1
>>
>> Conda builds of the beta are available in the rdkit channel for python
>> 3.7 and 3.8 on Linux:
>> conda install -c rdkit/label/beta rdkit
>>
>> I also did linux builds of the postgresql cartridge for postgresql v10,
>> v11, and v12:
>> conda install -c rdkit/label/beta rdkit-postgresql
>>
>> Please try out the beta and let us know if you find any problems!
>>
>> Best regards,
>> -greg
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Beta of 2021.09.1 release available.

2021-10-08 Thread Greg Landrum
Dear all,

I tagged the first beta of the 2021.09 RDKit release this morning. Assuming
nothing weird shows up during testing, we'll do the actual release on the
18th.

You can find the new beta here:
https://github.com/rdkit/rdkit/releases/tag/Release_2021_09_1b1

Conda builds of the beta are available in the rdkit channel for python 3.7
and 3.8 on Linux:
conda install -c rdkit/label/beta rdkit

I also did linux builds of the postgresql cartridge for postgresql v10,
v11, and v12:
conda install -c rdkit/label/beta rdkit-postgresql

Please try out the beta and let us know if you find any problems!

Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SVG depiction with fonts?

2021-10-04 Thread Greg Landrum
It would have helped if I'd gotten the sense of the arguments correct...
sorry about that.

What I should have typed is:
d = Draw.MolDraw2DSVG(350,300,-1,-1,True)
The "True" argument disables FreeType

-greg

On Mon, Oct 4, 2021 at 4:38 PM Geoffrey Hutchison 
wrote:

> > Unfortunately there are not keyword arguments for this (something for me
> to fix ASAP), but you can do this as follows:
> > d = Draw.MolDraw2DSVG(350,300,-1,-1,False)
>
> Perfect, thanks! Maybe the keyword can be `useFreeType = False` or
> something along those lines.
>
> -Geoff
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SVG depiction with fonts?

2021-09-30 Thread Greg Landrum
Hi Geoff,

you need to disable the use of freetype when you create the MolDraw2DSVG
object.
Unfortunately there are not keyword arguments for this (something for me to
fix ASAP), but you can do this as follows:
d = Draw.MolDraw2DSVG(350,300,-1,-1,False)

That last "False" turns off FreeType and uses normal SVG text.

I hope this helps,
-greg


On Tue, Sep 28, 2021 at 7:10 PM Geoffrey Hutchison <
geoff.hutchi...@gmail.com> wrote:

> Hi all,
>
> I recently upgraded to RDKit 2021.3 from the March 2020 version. With last
> year's release, I was able to tweak the generated SVG depictions to replace
> characters (e.g., where we used "*" in a SMILES but really wanted "M" for a
> metal center) or change the font-weight and font-size.
>
> svg.replace("font-weight:normal", "font-weight:bold")
>
> Now it seems as if the characters are turned into strokes. Is there an
> option to turn this off and go back to SVG characters with font-weight
> attributes?
>
> Thanks,
> -Geoff
>
> ---
> Prof. Geoffrey Hutchison
> Department of Chemistry
> University of Pittsburgh
> tel: (412) 648-0492
> email: geo...@pitt.edu
> twitter: @ghutchis
> web: https://hutchison.chem.pitt.edu/
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Reaction SMARTS ring properties

2021-09-13 Thread Greg Landrum
Hi Mark,

You haven't shown how the molecules in reagentsRdMolList are constructed,
but from the error message I guess that they have been created without
running any of the sanitization code.
The example transformation you show includes ring information, so it will
generate errors if the molecules you run through the reaction don't have
the ring information pre-computed.
If you don't care about ring size information, the fastest way to get the
required info is to call MolOps::fastFindRings() on the molecules in
reagentsRdMolList before passing them to rxn->runReactants().
If you include ring size (or ring count) information in your
transformations, then using MolOps::symmetrizeSSSR() is a better idea.

I hope that helps,
-greg


On Tue, Sep 14, 2021 at 12:33 AM Mark Mackey via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hi all,
>
>
>
> I’m trying to run some chemical reactions using the C++ API. Doing
> something like
>
>
>
> text = “
> [cH1:1]1:[c:2](-[CH2:7]-[CH2:8]-[NH2:9]):[c:3]:[c:4]:[c:5]:[c:6]:1.[#6:11]-[CH1;R0:10]=[OD1]>>[c:1]12:[c:2](-[CH2:7]-[CH2:8]-[NH1:9]-[C:10]-2(-[#6:11])):[c:3]:[c:4]:[c:5]:[c:6]:1c1cc(CCN)ccc1”;
>
> auto rxn =RDKit::RxnSmartsToChemicalReaction(text, nullptr, false);
>
> rxn->initReactantMatchers();
>
> auto reactionProducts = rxn->runReactants(reagentsRdMolList, 0);
>
>
>
> works fine for trivial transformations, but throws “RingInfo not
> initialized” for anything more complex. Googling has shown that the problem
> is that the ring information is not automatically calculated for the SMARTS
> molecules, but I’m failing to work out what I need to call
> RDKit::MolOps::findSSSR on and when. All pointers appreciated.
>
>
>
> Regards,
>
> Mark
>
>
>
>
>
> *Dr Mark Mackey Chief Scientific Officer *
>
> *Cresset *
>
>
>
>
>
> This email has been sent from Cresset BioMolecular Discovery Limited,
> registered in England and Wales, Company Number: 04151475. The information
> in this email and any attachments are confidential and may be privileged.
> It is intended solely for the addressee and access to this email by anyone
> else is unauthorised. If an addressing or transmission error has
> misdirected this email, please notify the author by replying to this email.
> If you are not the intended recipient you must not use, disclose,
> distribute, store or copy the information in any medium. Although this
> e-mail and any attachments are believed to be free from any virus or other
> defect which might affect any system into which they are opened or
> received, it is the responsibility of the recipient to check that they are
> virus-free and that they will in no way affect systems and data. No
> responsibility is accepted by Cresset BioMolecular Discovery Limited for
> any loss or damage arising in any way from their receipt, opening or use. 
> Privacy
> notice 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Does input format matter?

2021-08-03 Thread Greg Landrum
Hi Leon,

Any existing conformers of the molecule are ignored when generating new
conformers, so as long as the atom ordering is the same and you use the
same random-number seed you should get the same conformers whether you
start from 3D coordinates, 2D coordinates, or no coordinates.

-greg


On Tue, Aug 3, 2021 at 6:26 AM topgunhaides  wrote:

> Hi Greg,
>
> Thanks a lot for the explanation. Sorry for the late response as I
> was doing more tests.
> So I guess that the SDF (with 3D coordinate) and SDF (with only 2D
> coordinates) would also give me the same results, as long as both use the
> same atom ordering and random seed numbers?
> To put it another way, does the 3D coordinate in SDF input give any
> advantages over the 2D SDF, in producing conformers with higher quality? Or
> are the results also only atom-order and random seed dependent?
>
> I was using SDF with experimental 3D coordinates as input to generate
> conformers and reproduce that experimental conformation. Although RDKit's
> ETKDG is stochastic, I want to make sure that the input with some
> predefined information (like experimental 3D coordinates in SDF or stereo
> info in SMILES sting) will not affect the results or give biased results.
> Thank you!
>
> Best,
> Leon
>
>
> On Tue, Jul 27, 2021 at 12:59 AM Greg Landrum 
> wrote:
>
>> Hi Leon,
>>
>> The results of EmbedMultipleConfs() have a random component and are
>> atom-order dependent. The first of these will cause you to get different
>> results across runs even if you use exactly the same molecule unless you
>> set the random number seed.
>> The atom-order dependance means that if the SMILES and SDF have different
>> atom orderings you would get different conformers even if you set the same
>> random number seed. If the atom orderings are the same, however, you should
>> get the same results.
>>
>> I hope this helps,
>> -greg
>>
>>
>> On Mon, Jul 26, 2021 at 5:04 PM topgunhaides 
>> wrote:
>>
>>> Hi guys,
>>>
>>> By calling "EmbedMultipleConfs", are different results expected by
>>> changing from one input format to another? Like change it from a SDF (with
>>> 3D coordinates) to a SMILES string?
>>> Thank you!
>>>
>>> Leon
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using the bitsPerPoint argument of ShapeTanimotoDist

2021-07-27 Thread Greg Landrum
Hi Lewis,

This looks odd to me as well, but I don't have  a quick answer to
explain/account for it.
I'll try and take a look in the near future.

-greg


On Wed, Jul 21, 2021 at 11:57 PM Lewis Martin 
wrote:

> Hi RDKit,
> How does one input the number of bits to the ShapeTanimotoDist function?
> The docs indicate the default is 
> *rdkit.DataStructs.cDataStructs.DiscreteValueType.TWOBITVALUE,
>  *but I tried some other values and this gave unexpected results.
> Specifically: when increasing to higher bit values, the tanimoto similarity
> gets quite small, whereas I assumed increasing the sampled bits would
> simply improve the precision of the calculation.
>
> example:
> from rdkit import Chem
> from rdkit.DataStructs import TWOBITVALUE, FOURBITVALUE, EIGHTBITVALUE
> from rdkit.Chem.rdShapeHelpers import ShapeTanimotoDist
>
> testmol = Chem.MolFromSmiles('')
> testmolH = Chem.AddHs(testmol)
> AllChem.EmbedMultipleConfs(testmolH, 2)
> for bits in [TWOBITVALUE, FOURBITVALUE, EIGHTBITVALUE]:
> print(ShapeTanimotoDist(testmolH, testmolH, 0,1,bitsPerPoint=bits))
>
> output:
>
> 0.38089171974522296
> 0.14635701022642242
> 0.00452852989652238
>
>
>
> Thanks!
> Lewis
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Does input format matter?

2021-07-26 Thread Greg Landrum
Hi Leon,

The results of EmbedMultipleConfs() have a random component and are
atom-order dependent. The first of these will cause you to get different
results across runs even if you use exactly the same molecule unless you
set the random number seed.
The atom-order dependance means that if the SMILES and SDF have different
atom orderings you would get different conformers even if you set the same
random number seed. If the atom orderings are the same, however, you should
get the same results.

I hope this helps,
-greg


On Mon, Jul 26, 2021 at 5:04 PM topgunhaides  wrote:

> Hi guys,
>
> By calling "EmbedMultipleConfs", are different results expected by
> changing from one input format to another? Like change it from a SDF (with
> 3D coordinates) to a SMILES string?
> Thank you!
>
> Leon
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Javascript MinimalLib

2021-07-21 Thread Greg Landrum
Hi Dave,

It's not in the JS interface yet, but I'll add it now.

-greg


On Mon, Jul 19, 2021 at 4:57 PM David Cosgrove 
wrote:

> Hi,
>
> In this blogpost
> https://greglandrum.github.io/rdkit-blog/technical/2021/05/01/rdkit-cffi-part1.html,
> Greg mentions the CFFI function get_json().  Is that exposed in the JS
> MinimalLIb, and if so, how would I use it?  I see all sorts of good stuff
> in cffiwrapper.h, but I can't work out how to call them from JS.
>
> Thanks,
> Dave
>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search for an aldehyde returns ketones and acids

2021-07-21 Thread Greg Landrum
Yeah, this is exactly the case where using qmol_from_ctab() should help.

Below is a short example demonstrating this by querying my local ChEMBL
instance. Notice that the first form of the query, which uses
mol_from_ctab() matches what you describe: the results include amides,
esters, etc. The second query, which uses qmol_from_ctab(), only returns
molecules which have a ketone.

I hope this helps,
-greg

chembl_28=# select * from rdk.mols where m@>mol_from_ctab('aldehyde query
  MJ192500

  4  3  0  0  0  0  0  0  0  0999 V2000
   -2.81231.55080. C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.52671.13830. C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.24121.55080. H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.52670.31330. O   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  2  4  2  0  0  0  0
  2  3  1  0  0  0  0
M  END
') limit 5;
 molregno |   m
--+
   310993 | O=C(NO)c1cc(CS(=O)(=O)c2ccc(Cl)cc2)on1
   310992 | O=C(NO)c1cc(CS(=O)(=O)c2(Cl)c2)on1
   318822 | CCC(NC(=O)C[C@H](N)C(=O)N1CCC[C@H]1C#N)c1c1
   310016 | O=C(CCNC(=O)c1c1)NC1CCN(Cc2ccc(Cl)cc2)C1
   319381 | CCOC(=O)/C=C/c1ccc(CN(C(=O)C2C2)c2(/C=C/C(=O)OC)c2)cc1
(5 rows)

chembl_28=# select * from rdk.mols where m@>qmol_from_ctab('aldehyde query
  MJ192500

  4  3  0  0  0  0  0  0  0  0999 V2000
   -2.81231.55080. C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.52671.13830. C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.24121.55080. H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.52670.31330. O   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  2  4  2  0  0  0  0
  2  3  1  0  0  0  0
M  END
') limit 5;
 molregno |
m

--+
   284772 | COC(=O)NC1[C@H](C)O[C@@H](O[C@H]2C/C=C(\C)[C@@H]3C=C[C@@H]4[C@
@H](O)[C@@H](C)C[C@H](C)[C@H]4[C@]3(C)/C(O)=C3\C(=O)O[C@]4(CC(C=O)=C[C@H
](OC(C)=O)[C@H]4/C=C\2C)C3=O)CC1(C)[N+](=O)[O-]
   284633 | COC(=O)NC1[C@H](C)O[C@@H](O[C@H]2C/C=C(\C)[C@@H]3C=C[C@@H]4[C@
@H](O[C@H]5O5)[C@@H](C)C[C@H](C)[C@H]4[C@]3(C)/C(O)=C3\C(=O)O[C@
]4(CC(C=O)=C[C@H](OC(C)=O)[C@H]4/C=C\2C)C3=O)CC1(C)[N+](=O)[O-]
   284865 | COC(=O)NC1[C@H](C)O[C@@H](O[C@H]2C/C=C(\C)[C@@H]3C=C[C@@H]4[C@
@H](OCc5ccc(OC)cc5)[C@@H](C)C[C@H](C)[C@H]4[C@]3(C)/C(O)=C3\C(=O)O[C@
]4(CC(C=O)=C[C@H](OC(C)=O)[C@H]4/C=C\2C)C3=O)CC1(C)[N+](=O)[O-]
   299586 | CC1(C)C2CC[C@]3(C)C(CC=C4C5CC(C)(C)[C@@H](OC(=O)c6c6)[C@H
](OC(=O)/C=C/c6c6)[C@]5(C=O)[C@H](O)C[C@]43C)[C@@]2(C)CC[C@@H]1O
   317613 | Cn1cncc1C=O
(5 rows)



On Tue, Jul 20, 2021 at 11:55 PM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> I should have included the query. It looks like RD Kit is ignoring the H
> atom
>
> The user put in an explicit H
>
> ===MOL file after this
>
> aldehyde query
>
>   MJ192500
>
>
>
>   4  3  0  0  0  0  0  0  0  0999 V2000
>
>-2.81231.55080. C   0  0  0  0  0  0  0  0  0  0  0  0
>
>-3.52671.13830. C   0  0  0  0  0  0  0  0  0  0  0  0
>
>-4.24121.55080. H   0  0  0  0  0  0  0  0  0  0  0  0
>
>-3.52670.31330. O   0  0  0  0  0  0  0  0  0  0  0  0
>
>   2  1  1  0  0  0  0
>
>   2  4  2  0  0  0  0
>
>   2  3  1  0  0  0  0
>
> M  END
>
> =MOL file above this
>
>
>
>
>
> *From:* Greg Landrum 
> *Sent:* Friday, July 16, 2021 11:38 PM
> *To:* Webster Homer 
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] Substructure search for an aldehyde
> returns ketones and acids
>
>
>
> *[WARNING – EXTERNAL EMAIL]* Do not open links or attachments unless you
> recognize the sender of this email. If you are unsure please click the
> button "Report suspicious email"
>
>
>
> Hi Webster,
>
>
>
> Without seeing an actual query I am inclined to believe that it’s not a
> bug. The problem is more likely a query which has not been drawn explicitly
> or an easily made mistake in the way the cartridge is being used.
>
>
>
> Assuming that the aldehyde queries have been drawn with an explicit H atom
> connected to the C (apologies for not showing this, I’m on my phone and
> don’t have a sketcher available), you should be calling the cartridge
> function qmol_from_ctab(), not mol_from_ctab(), before doing the query.
> qmol_from_ctab() will use the H to help define the query.
>
>
>
> If you’re doing this and still seeing incorrect search results, please
> share a query and the way y

Re: [Rdkit-discuss] Substructure search for an aldehyde returns ketones and acids

2021-07-16 Thread Greg Landrum
Hi Webster,

Without seeing an actual query I am inclined to believe that it’s not a
bug. The problem is more likely a query which has not been drawn explicitly
or an easily made mistake in the way the cartridge is being used.

Assuming that the aldehyde queries have been drawn with an explicit H atom
connected to the C (apologies for not showing this, I’m on my phone and
don’t have a sketcher available), you should be calling the cartridge
function qmol_from_ctab(), not mol_from_ctab(), before doing the query.
qmol_from_ctab() will use the H to help define the query.

If you’re doing this and still seeing incorrect search results, please
share a query and the way you’re doing the search and we can try to help
(or diagnose the bug if there is one)

Best,
-greg


On Fri, 16 Jul 2021 at 17:53, Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> We use RDKit Postgresql cartridge as our substructure searcher. When a
> user sketches an aldehyde and submits the mol fle as the query. RD Kit
> returns aldehydes, but also returns ketones and acids. Is this a bug?
>
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
>
>
> Click merckgroup.com/disclaimer
>  to
> access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak
> versions of this disclaimer.
>
>
>
> Please find our Privacy Statement information by clicking here
> merckgroup.com/en/privacy-statement.html
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Are Partial Charge Calculations Dependent on Conformers?

2021-06-30 Thread Greg Landrum
Hi Hao,

The reference for how the Gasteiger charges is calculated is in the
documentation for the function:
https://www.rdkit.org/docs/source/rdkit.Chem.rdPartialCharges.html#rdkit.Chem.rdPartialCharges.ComputeGasteigerCharges
It does not use atomic coordinates.

The MMFF charges are described in the MMFF94 papers (googling for MMFF94
will turn these up). They also do not use atomic coordinates.

If you really need partial charges which are dependent on the 3D conformer
(and I wonder why you do), the only option in the RDKit would be to use
it's implementation with the YAeHMOP package to do a semi-empirical QM
calculation:

from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdEHTTools
m = Chem.AddHs(Chem.MolFromSmiles('OCCN'))
rdDistGeom.EmbedMolecule(m)
ok,res = rdEHTTools.RunMol(m)
res.GetAtomicCharges()


Note that I call AddHs() there before generating the 3D coordinates. Recent
versions of the RDKit generate a warning if you don't do this. That's not
one which you should ignore: you generally need the Hs there in order to
get good conformations.

There are many other methods out there which derive charges from quantum
mechanical calculations, but those all require using external software.

Why do you want partial charges which are dependent on conformer?

-greg


On Thu, Jul 1, 2021 at 3:53 AM Hao  wrote:

> Hi RDKit community,
>
> I am not familiar with how partial charges are calculated and I couldn't
> seem to find anything in my searches.
>
> If you run the code below, you'll see that the partial charges are always
> the same, even though the embedded mol is different - which leads me to
> believe these partial charge calculations are not dependent on conformers
> (which I always thought they were?)
>
> Can someone with more knowledge than me confirm my hypothesis? Also does
> rdkit have any partial charge calculators that are dependent on conformers?
>
> mol = Chem.MolFromSmiles('C[C@@](CC1=CC(O)=C(O)C=C1)(NN)C(O)=O')
> AllChem.EmbedMolecule(mol, AllChem.ETKDG())
> AllChem.ComputeGasteigerCharges(mol)
> contribs = [float(mol.GetAtomWithIdx(i).GetProp('_GasteigerCharge')) for i
> in range(mol.GetNumAtoms())]
> fps = AllChem.MMFFGetMoleculeProperties(mol)
> mmff_partial_charges = [fps.GetMMFFPartialCharge(x) for x in
> range(mol.GetNumAtoms())]
> print(mmff_partial_charges)
> print(contribs)
>
> Thanks,
> Hao
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Shape Tanimoto distance question

2021-06-28 Thread Greg Landrum
Hi Leon,

You can convert the tanimoto distance to similarity, but the formula is:
Similarity = 1 - Distance

Best,
-greg


On Tue, Jun 29, 2021 at 3:21 AM topgunhaides  wrote:

> Hi guys,
>
> A quick question:
>
> RDKit computes the "Shape Tanimoto distance" by calling the
> "ShapeTanimotoDist".
> I assume that similarities and distances can be interconverted using the
> following equation?
>
> Shape Tanimoto similarity = 1 / (1 + Shape Tanimoto distance)
>
> Correct?  Thank you!
>
> Best,
> Leon
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Autodock Vina

2021-06-22 Thread Greg Landrum
Hi Velik,

This is a discussion list for the RDKit, not for Autodock Vina.

Here's the link for getting help about Autodock Vina:
http://vina.scripps.edu/questions.html

Best,
-greg

On Tue, Jun 22, 2021 at 10:08 AM Velik Velikov  wrote:

> Dear all,
>
>
>
> I am constructing new molecules (de novo design) that are drug-like with
> RDKit. I have my molecules in SMILES now and I need to check them with
> AutoDock Vina. I have never used it and I have been trying since last week
> but I kind of don’t know where to go from here.
>
> What is my config file, ligand or receptor? Do I need MGL Tools, PyMOL or
> something else?
>
> Also, I couldn’t run it on my mac - Big Sur, I tried with a VirtualBox but
> it didn’t work out either. I am thinking about installing Autodock Vina on
> my old windows laptop now. Appreciate any help with this tool. Thanks in
> advance.
>
>
> Best,
>
> Velik Velikov
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Can substruct_count() accept qmol as the second input?

2021-06-21 Thread Greg Landrum
Hi Xinzhou,

I will take a look at adding support for this to the cartridge.

-greg


On Wed, Jun 9, 2021 at 9:54 PM Xinzhou Liu via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hi RDKit team,
>
> I'm doing some substructure counting using the RDKit cartridge in my
> postgres db. The SQL function substruct_count(mol, mol, bool) only accepts
> mol (strict molecule) as the second input, but what I would like to have is
> a little fuzziness, meaning using the qmol (molecule with query features)
> as the query substructure.
>
>However, the current "substruct_count" function does not accept that
> type. But the workaround I found is to create a SQL function
> "substruct_count(mol, qmol, bool)" to pass the qmol to the low-level c
> function "mol_substruct_count", which accepts qmol. Here is the create
> script of this function:
>
> CREATE OR REPLACE FUNCTION public.substruct_count(
> mol,
> qmol,
> boolean DEFAULT true)
> RETURNS integer
> LANGUAGE 'c'
> COST 1
> IMMUTABLE STRICT PARALLEL UNSAFE
> AS 'rdkit', 'mol_substruct_count'
> ;
>
> But creating such C function wrapper is not permitted in our new AWS RDS
> database. Can you add this overloading function "substruct_count(mol, qmol,
> bool)" in the rdkit cartridge? It would be beneficial for anyone who wishes
> to search a class of substructures rather than a single substructure using
> the qmol type.
>
> Please let me know if there is any question. Thank you!
>
> Best,
> Xinzhou Liu
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-18 Thread Greg Landrum
Hi JP,

On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer  wrote:

>
> I am trying to standardize(/normalize?) some molecules from different
> sources, to generate a set of descriptors for them.  I have done this a
> number of times, and each time I find the process slightly confusing.  I
> have the following questions please, if you don't mind:
>
>
As a starting point in case you want more information about this topic.
I did a webinar/presentation on this topic earlier this year as part of the
RSC Open Science series.

My materials for that are in github:
https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
and there's a youtube recording:
https://www.youtube.com/watch?v=eWTApNX8dJQ



> 1.  What is the relation between molvs and rdkit (I remember there was an
> integration project between the two a while back).  When I call
> rdMolStandardize does rdkit code or molvs code get called?  The github repo
> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
>

When you call operations from rdMolStandardize it invokes RDKit code. That
code was started by Susan Leung as a Google Summer of Code project and we
have continued to improve and expand that code since then.


> 2.  What is the difference between standardization and normalization of a
> molecule?  Does one automatically imply the other or should these two
> processes be both run on a molecule?
>

I would be surprised if there were universal agreement about this, but when
I use the terms normalization typically refers to making changes to
molecules to get "functional groups" (loosely defined) into a normal form,
while standardization is getting the molecules into a standard form in
preparation for doing something with them. Normalization is often part of
standardization, standardization can also include things like stripping
salts, neutralizing molecules, etc.
Normalization involves applying transformations like converting -N(=O)=O to
-[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O;


> 3.  Specifically, what is the difference between
> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
> rdMolStandardize.Normalize(mol).  Should I call any of these manually three
> after I run "standardization/cleaning operations" such as uncharging,
> reionizing, etc?
>

SanitizeMol() is different from the others: it does a small amount of
normalization - fixing groups like nitro which are commonly drawn in a
hypervalent state but which can be represented in a charge-separated form
without needing weird valences - and some validation - rejecting molecules
with atoms that have non-physical valences, rejecting molecules that cannot
be kekulized - and a bunch of chemistry perception - ring finding,
calculating valences, finding aromatic systems, etc.

rdMolStandardize.Normalize() applies a bunch of standard transformations to
a molecule.

rdMolStandardize.Cleanup() does a number of standardization operations:
- removeHs
- disconnect metal atoms
- normalize the molecule
- reionize the molecule

4.  I understand what uncharge does, but what does reionizer do?
>

Reionizing does two things:
1. adds a charge to a small set of free atoms which are likely counterions.
These include Na, Mg, Cl, etc.
1a. if the above added a positive charge: remove an H from an acidic group
to neutrailze the positive charge that was added.
2. Moves negative charges from less acidic groups to more acidic groups.

5.  Is there a way to chain operations together
> standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order
> makes sense here), other than creating a class instance for each calling
> the method, returning a new mol and using this mol in the next operation?
>

The easy "pipeline" type functions in rdMolStandardize are the xxxParent
functions.
- fragmentParent: cleanup(), pick largest fragment
- chargeParent: fragmentParent(); uncharge()

Note that this list will be more complete in the 2021.09 release.


>
> Apologies for the many questions.  Have I missed the documentation about
> this?  I have found some excellent examples here:
> https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
> (thanks!).  This is not exactly a cleaning pipeline, but still quite
> helpful to understand these methods.
>
>
The github link I provide above has some more up-to-date information about
what the code currently does.
This all needs to land in the RDKit documentation

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit version in AWS Aurora?

2021-06-06 Thread Greg Landrum
Hi Brian,

On Mon, Jun 7, 2021 at 4:36 AM Brian Cole  wrote:

> This is a bit more of a question for AWS themselves, though I believe the
> RDKit build for the Postgres extension can be improved as well.
>
> The AWS documentation states, “RDKit extension version 3.8.”
>
> https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Updates.20180305.html
>
> However, it doesn’t appear like that 3.8 version number has been bumped in
> a few RDKit versions. When is that version supposed to be bumped? Or am I
> missing some other way to find the RDKit version in the Postgres extension?
>

A large part of the problem here is that we're not very good about
providing version information for the cartridge. Any changes here require
manual updates and I normally forget to either make those changes myself or
check that they've been done while reviewing PRs.

One thing which may be at least a little bit more up-to-date is the output
of the rdkit_version command in the cartridge itself:
chembl_28=# select rdkit_version();
 rdkit_version
---
 0.76.0
(1 row)

It looks like that was bumped from 0.74 to 0.75 in 2020.3 The bump to 0.76
will be in the 2021.09 release

What I think will be most useful, but which won't be available until the
2021.09 release, is the rdkit_toolkit_version() command. This will show you
the actual rdkit verison in the back and has the advantage that it's
autotomatically updated.

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] XYZ to mol ???

2021-06-06 Thread Greg Landrum
Hi Joey,

This is a non-trivial problem and one which Jan Jensen seems to have solved
quite nicely with xyz2mol:
https://github.com/jensengroup/xyz2mol

Are you able to use that package?

-greg

On Sat, Jun 5, 2021 at 2:56 AM Storer, Joey (J)  wrote:

> Dear all,
>
>
>
> For molecular modeling workflows and interoperability with QM/MM etc.,
>
>
>
> Can RDKit gain a Chem.XyzToMol(xyz) functionality?
>
>
>
> Thanks for considering this.
>
>
>
> Joey Storer
>
> Dow, Inc.
>
> General Business
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKfingerprint function

2021-06-03 Thread Greg Landrum
Sorry I missed this thread earlier.

I think the documentation answers this question:
https://www.rdkit.org/docs/RDKit_Book.html#rdkit-fingerprints

-greg


On Tue, May 18, 2021 at 12:34 PM Nils Weskamp 
wrote:

> Hi Din,
>
> to the best of my knowledge, the "RDKit" fingerprints are a path-based
> ("Daylight-like") fingerprint as described in slide 7 of that presentation.
>
> You may want to have a look at
>
> https://www.daylight.com/dayhtml/doc/theory/theory.finger.html
>
> for a more detailed description.
>
> Best,
> Nils
>
> Am 18.05.2021 um 12:27 schrieb דין עזרא:
> > Hi Nils,
> >
> > Thanks for your mail !
> > So if I understand correctly, the function of the fingerprint is related
> > to the topological fingerprint in the document?
> >
> > Thanks,
> > Din
> > On 18 May 2021, 13:10 +0300, Nils Weskamp ,
> wrote:
> >>
> >> There may be more recent sources available.
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit and pip

2021-06-03 Thread Greg Landrum
Hi Marco,

On Wed, May 26, 2021 at 2:37 PM Marco Stenta  wrote:

> Dear Colleagues,
> I recently came across this
> https://pypi.org/project/rdkit-pypi/
>
> is pip going to be supported officially by the dev community? any plan?
>

I'm not quite sure yet. I believe that at the moment the pip images are
still missing the extra data files that the RDKit requires in order to
correctly function.
After that's taken care of, we'll need one or more volunteers to make sure
that the rdkit-pypi images stay up to date. Just like the conda-forge
packages, this will be something that's community maintained, not something
the development team takes care of.


> getting out of the conda dependency might be beneficial to get
> slightly slimmer docker images.
>

If this is something you actually care about, you can just create a docker
image which has a local RDKit build matching the version of Python used in
the container.

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] smarts question

2021-06-02 Thread Greg Landrum
Hi Hao,

The RDKit's extensions to SMARTS are documented here:
https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions

There is not an extension for "N_lp" and what the translation should be is
VERY highly dependent on what the authors meant by "lone pair".
For example: does the N involved in an amide bond have a lone pair by the
authors' definition?

-greg



On Wed, Jun 2, 2021 at 8:50 AM hwang929  wrote:

> Hi,
> I'm a student from the school of chemistry and molecular engineering, East
> China Normal University. I have some questions about smarts.
>
> I used the self defined smart writing method in this article(Torsion
> Library Reloaded: A New Version of Expert-Derived SMARTS Rules for
> Assessing Conformations of Small Molecules).
>  For example,   [*:1]~[*^3:2]!@[*^3:3]~[*:4] Here a "^3" denotes an sp3
> hybridized atom. In the same way,"^2" denotes an sp2 hybridized atom."^1"
> denotes an sp hybridized atom. But '^' symbol is not found in daylight, to
> my surprise I found that it can be read in and find the corresponding
> structure in rdkit. I don't know whether rdkit has a specific method to
> identify or what? If not correctly identified, how to express it(
> [*:1]~[*^3:2]!@[*^3:3]~[*:4])
>
> The second question: Another kind of smarts containing N_lp(eg:
> [CX4:1][CX4H2:2]!@[NX3;N_lp:3][CX4:4])
> N_lp explicitly requires a trivalent nitrogen with a lone pair.  for
> example sulfonamides .It is not recognized by rdkit. Do I have any way to
> express it? What can I replace with N_lp.
>
> Thanks
> Kind regards,
> Hao Wang
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 2021 RDKit UGM registration now open

2021-05-28 Thread Greg Landrum
Dear all,

Registration for the 2021 RDKit virtual UGM is now open:
https://www.eventbrite.com/e/virtual-10th-rdkit-ugm-2021-tickets-157206887031

I'm currently sending this announcement only to the mailing list so that
people here have a chance to register first. I'll do the usual social media
announcements next week. Please feel free to forward this colleagues, but
I'd appreciate it if no one did a social media post about it until Monday.

If you'd like to do a talk, lightning talk, or virtual poster, please send
me email to let me know!

Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] para-stereochemistry

2021-05-27 Thread Greg Landrum
I first heard the term "para stereochemistry" from Salome Rieder (a PhD
student at the ETH Zurich), who pointed me to this paper as a source for
term:
https://pubs.acs.org/doi/pdf/10.1021/ci00016a003

I tend to call this "dependent stereochemistry" since I think it's clear,
but that's definitely something I just made up. :-)

-greg


On Wed, May 26, 2021 at 10:54 PM Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Dear Paolo,
>
> According to https://goldbook.iupac.org/terms/view/P04921
> your interpretation is certainly the correct one.
> I still have to find how this r/s assignment is determined.
>
> Many thanks again,
>
> Jean-Marc
>
>
> Le 26/05/2021 à 22:40, Paolo Tosco a écrit :
> > Dear Jean-Marc,
> >
> > I believe it indicates what the IUPAC Gold Book refers to as
> pseudoasymmetry.
> > Let’s see if others agree with my interpretation.
> >
> > Cheers,
> > P.
> >
> >> On 26 May 2021, at 22:28, Jean-Marc Nuzillard <
> jm.nuzill...@univ-reims.fr> wrote:
> >>
> >> I believed I sent a message with the same title a few minutes ago, but
> apparently something went wrong.
> >>
> >> Reading the RDKit book about function FindMolChiralCenters(),
> >> I saw that it provides a better handling of para-stereochemisry.
> >> This concept is not familiar to me.
> >> Google did not help and sent me back to the RDKit Book.
> >> So, what is para-stereochemistry?
> >>
> >> Best regards,
> >>
> >> Jean-Marc
> >>
> >> --
> >> Jean-Marc Nuzillard
> >> Directeur de Recherches au CNRS
> >>
> >> Institut de Chimie Moléculaire de Reims
> >> CNRS UMR 7312
> >> Moulin de la Housse
> >> CPCBAI, Bâtiment 18
> >> BP 1039
> >> 51687 REIMS Cedex 2
> >> France
> >>
> >> Tel : 03 26 91 82 10
> >> Fax : 03 26 91 31 66
> >> http://www.univ-reims.fr/icmr
> >> http://eos.univ-reims.fr/LSD/CSNteam.html
> >>
> >> http://www.univ-reims.fr/LSD/
> >> http://www.univ-reims.fr/LSD/JmnSoft/
> >>
> >>
> >>
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> --
> Jean-Marc Nuzillard
> Directeur de Recherches au CNRS
>
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66
> http://www.univ-reims.fr/icmr
> http://eos.univ-reims.fr/LSD/CSNteam.html
>
> http://www.univ-reims.fr/LSD/
> http://www.univ-reims.fr/LSD/JmnSoft/
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Are the path-based fingerprints formally described in the scientific literature?

2021-05-19 Thread Greg Landrum
Hi Francois,

On Thu, May 20, 2021 at 3:19 AM Francois Berenger  wrote:

>
> The other day, I was looking for a paper describing them
> but the only thing I found was a reference to some Daylight
> product.
>
> I know there is a paper (maybe several in fact) for ECFP for example.
> Weren't the path-based FPs formally described somewhere?
>

No, they aren't. I did that implementation in the very early days of the
RDKit because we needed something for calculating similarity and machine
learning.
There's no paper describing the fingerprint. If you're looking for an
explanation or something to cite, the best thing is this section in the
documentation: https://rdkit.org/docs/RDKit_Book.html#rdkit-fingerprints

Best,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problem finding potential stereo-centres in bridged bicyclics involving 4-membered rings?

2021-05-19 Thread Greg Landrum
Thanks James!

On Wed, 19 May 2021 at 17:40, James Davidson 
wrote:

> Hi Greg,
>
>
>
> Thanks for the response (and sorry to be the bearer of bad news!).
>
> Issue added:  https://github.com/rdkit/rdkit/issues/4155
>
>
>
> Kind regards
>
>
>
> James
>
>
>
> *From:* Greg Landrum 
> *Sent:* 19 May 2021 14:59
> *To:* James Davidson 
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] Problem finding potential stereo-centres
> in bridged bicyclics involving 4-membered rings?
>
>
>
> Hi James,
>
>
>
> I don't think that's the same bug as #3490. I think it's something
> different; "yay".
>
> ;-)
>
>
>
> It would be great if you could file a github issue for this.
>
>
>
> Thanks,
>
> -greg
>
>
>
>
>
> On Wed, May 19, 2021 at 3:20 PM James Davidson 
> wrote:
>
> Dear All,
>
>
>
> I’ve got a strong suspicion that what I am seeing is related to the open
> issue 3490 (https://github.com/rdkit/rdkit/issues/3490), but as I can’t
> seem to find a mention of a non-spiro problem then I thought I would share.
>
> Tested in 2020.09.4 and 2021.03.2 with the same result.
>
>
>
> smi_list = ['CC1CCC(CC1)C(N)=O', 'CC12CCC(CC1)(C2)C(N)=O',
> 'CC1CC(C1)C(N)=O', 'CC12CC(C1)(CC2)C(N)=O']
>
> for smi in smi_list:
>
> mol = Chem.MolFromSmiles(smi)
>
> display(show_mol(mol, size=(450, 200)))  # wrapper function for new
> drawing code in jupyter
>
> print(list(Chem.FindPotentialStereo(mol)))
>
> print(Chem.FindMolChiralCenters(mol, includeUnassigned=True,
> useLegacyImplementation=False))
>
>
>
> The 4 cases are:
>
>- Symmetrically-disubstituted 6-membered ring
>- A bridged version (using a 1-atom bridge to avoid a completely
>symmetrical product)
>- Symmetrically-disubstituted 4-membered ring
>- A bridged version (this time using a 2-atom bridge to avoid symmetry)
>
>
>
> And this is what I see:
>
>
>
>
>
> In the case of the bridged 4-membered ring (or bridged 5-membererd ring,
> depending on your viewpoint!), FindPotentialStereo() fails to identify the
> two potential stereo atoms.
>
> If anyone can spot if this is the same issue as 3490, or something
> different, then that would be appreciated!
>
>
>
> Kind regards
>
>
>
> James
> --
>
>
> PLEASE READ - This email is confidential and may be privileged. It is
> intended for the named addressee(s) only and access to it by anyone else is
> unauthorised. If you are not an addressee, any disclosure or copying of the
> contents of this email or any action taken (or not taken) in reliance on it
> is unauthorised and may be unlawful. If you have received this email in
> error, please notify the sender or postmas...@vernalis.com. Email is not
> a secure method of communication and the Company cannot accept
> responsibility for the accuracy or completeness of this message or any
> attachment(s). Please check this email for virus infection for which the
> Company accepts no responsibility. If verification of this email is sought
> then please request a hard copy. Unless otherwise stated, any views or
> opinions presented are solely those of the author and do not represent
> those of the Company.
>
> Vernalis (R) Limited (no. 1985479)
> Granta Park, Great Abington
> Cambridge, CB21 6GB, United Kingdom
> Tel: +44 (0)1223 895 555
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problem finding potential stereo-centres in bridged bicyclics involving 4-membered rings?

2021-05-19 Thread Greg Landrum
Hi James,

I don't think that's the same bug as #3490. I think it's something
different; "yay".
;-)

It would be great if you could file a github issue for this.

Thanks,
-greg


On Wed, May 19, 2021 at 3:20 PM James Davidson 
wrote:

> Dear All,
>
>
>
> I’ve got a strong suspicion that what I am seeing is related to the open
> issue 3490 (https://github.com/rdkit/rdkit/issues/3490), but as I can’t
> seem to find a mention of a non-spiro problem then I thought I would share.
>
> Tested in 2020.09.4 and 2021.03.2 with the same result.
>
>
>
> smi_list = ['CC1CCC(CC1)C(N)=O', 'CC12CCC(CC1)(C2)C(N)=O',
> 'CC1CC(C1)C(N)=O', 'CC12CC(C1)(CC2)C(N)=O']
>
> for smi in smi_list:
>
> mol = Chem.MolFromSmiles(smi)
>
> display(show_mol(mol, size=(450, 200)))  # wrapper function for new
> drawing code in jupyter
>
> print(list(Chem.FindPotentialStereo(mol)))
>
> print(Chem.FindMolChiralCenters(mol, includeUnassigned=True,
> useLegacyImplementation=False))
>
>
>
> The 4 cases are:
>
>- Symmetrically-disubstituted 6-membered ring
>- A bridged version (using a 1-atom bridge to avoid a completely
>symmetrical product)
>- Symmetrically-disubstituted 4-membered ring
>- A bridged version (this time using a 2-atom bridge to avoid symmetry)
>
>
>
> And this is what I see:
>
>
>
>
>
> In the case of the bridged 4-membered ring (or bridged 5-membererd ring,
> depending on your viewpoint!), FindPotentialStereo() fails to identify the
> two potential stereo atoms.
>
> If anyone can spot if this is the same issue as 3490, or something
> different, then that would be appreciated!
>
>
>
> Kind regards
>
>
>
> James
> --
>
> PLEASE READ - This email is confidential and may be privileged. It is
> intended for the named addressee(s) only and access to it by anyone else is
> unauthorised. If you are not an addressee, any disclosure or copying of the
> contents of this email or any action taken (or not taken) in reliance on it
> is unauthorised and may be unlawful. If you have received this email in
> error, please notify the sender or postmas...@vernalis.com. Email is not
> a secure method of communication and the Company cannot accept
> responsibility for the accuracy or completeness of this message or any
> attachment(s). Please check this email for virus infection for which the
> Company accepts no responsibility. If verification of this email is sought
> then please request a hard copy. Unless otherwise stated, any views or
> opinions presented are solely those of the author and do not represent
> those of the Company.
>
> Vernalis (R) Limited (no. 1985479)
> Granta Park, Great Abington
> Cambridge, CB21 6GB, United Kingdom
> Tel: +44 (0)1223 895 555
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] enexpected chiral center

2021-05-17 Thread Greg Landrum
Hi Jean-Marc,

In that particular configuration:
[image: image.png]
 the central atom is not a chiral center since atoms 1 and 5 have the same
absolute stereo.

However, if you change the stereo of either atom 1 or 5, then the central
atom can be a chiral center:

[image: image.png]

This possibility is why FindMolChiralCenters() flags that atom as a
possible stereocenter.

-greg



On Mon, May 17, 2021 at 12:09 PM Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Dear all,
>
> The determination of the absolute configuration of chiral centres is
> certainly not an easy problem.
> Even recognizing that a carbon atom is an asymmetric one is not that
> trivial, even for humans.
> I tried:
>  >>> smi = "C[C@H](O)C(O)[C@@H](O)C"
>  >>> m = Chem.MolFromSmiles(smi)
>  >>>
>
> Chem.FindMolChiralCenters(m,force=True,includeUnassigned=True,useLegacyImplementation=False)
> [(1, 'S'), (3, '?'), (5, 'S')]
> but the central carbon atom of this compound, indexed "3", is not an
> asymmetric one, is it?
>
> Best regards,
>
> Jean-Marc
>
> --
> Jean-Marc Nuzillard
> Directeur de Recherches au CNRS
>
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66
> http://www.univ-reims.fr/icmr
> http://eos.univ-reims.fr/LSD/CSNteam.html
>
> http://www.univ-reims.fr/LSD/
> http://www.univ-reims.fr/LSD/JmnSoft/
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] [RDKit UGM2021] Save the date: Oct 14 and 15

2021-05-10 Thread Greg Landrum
Hi,

This year's RDKit UGM is going to take place October 14 and 15. It will,
unfortunately, once again be a purely virtual event. Hopefully next year we
will be able to travel again and all get together in one physical location,
but this year it's not possible to really plan an in-person meeting.

Since it seemed to work well last time, we'll do a combination of zoom and
either discord or some other text-based chat functionality and will have
two sessions per meeting day: one earlier in the day which is easier for
people in Asia to attend and one later in the day which is easier for
people in the Americas.

I'll send around more info and a link to the registration in the next week
or so.

I'd also like to try a virtual hackathon of some type, but will schedule
that for a different time, probably sometime this summer. Again, more
details on that soon.

Best,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chem.inchi module

2021-05-04 Thread Greg Landrum
Hi Gonzalo,

Yes, the RDKit uses the InChI trust library to handle InChIs.

-greg

On Tue, May 4, 2021 at 7:30 AM Gonzalo Colmenarejo <
colmenarejo.gonz...@gmail.com> wrote:

> Hi Greg,
>
> does the RDKit use behind the functions in the Chem.inchi module the
> InChi-Trust software (e.g. the libinchi.dll for Windows)?
>
> Thanks a lot
>
> Gonzalo
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Do we have an exact implementation of Bemis-Murcko scaffolds in rdkit?

2021-04-27 Thread Greg Landrum
I'm not sure why you'd want to reimplement something that's already there,
but if this works better for you...

the easiest way to get a single function you could call would be to do
something like:

In [18]: def MolToGenericScaffold(mol):
...: return
MurckoScaffold.MakeScaffoldGeneric(MurckoScaffold.GetScaffoldForMol(mol))
...:
In [19]:
Chem.MolToSmiles(MolToGenericScaffold(Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1')))
Out[19]: 'CC(C1C1)C1CC1'


-greg


On Tue, Apr 27, 2021 at 4:32 AM Francois Berenger  wrote:

> On 27/04/2021 10:12, Francois Berenger wrote:
> > On 26/04/2021 23:35, Greg Landrum wrote:
> >> Hi Francois,
> >>
> >> The implementation which is there does, I believe, the right thing.
> >> However... first you need to find the Murcko Scaffold, then you can
> >> convert that scaffold to the generic form:
> >>
> >>> In [5]: m = Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1')
> >>> In [6]: scaff = MurckoScaffold.GetScaffoldForMol(m)
> >>> In [7]: Chem.MolToSmiles(scaff)
> >>> Out[7]: 'O=C(c1c1)C1CC1'
> >>> In [8]: framework = MurckoScaffold.MakeScaffoldGeneric(scaff)
> >>> In [9]: print(Chem.MolToSmiles(framework))
> >>> CC(C1C1)C1CC1
> >
> > Ok, maybe this two steps process is a little bit better, but still
> > not exactly what I would expect in some cases.
> >
> > I'll say if I program something which I prefer.
>
> Hello,
>
> I end up with this:
> ---
> def find_terminal_atoms(mol):
>  res = []
>  for a in mol.GetAtoms():
>  if len(a.GetBonds()) == 1:
>  res.append(a)
>  return res
>
> # Bemis, G. W., & Murcko, M. A. (1996).
> # "The properties of known drugs. 1. Molecular frameworks."
> # Journal of medicinal chemistry, 39(15), 2887-2893.
> def BemisMurckoFramework(mol):
>  # keep only Heavy Atoms (HA)
>  only_HA = rdkit.Chem.rdmolops.RemoveHs(mol)
>  # switch all HA to Carbon
>  rw_mol = Chem.RWMol(only_HA)
>  for i in range(rw_mol.GetNumAtoms()):
>  rw_mol.ReplaceAtom(i, Chem.Atom(6))
>  # switch all non single bonds to single
>  non_single_bonds = []
>  for b in rw_mol.GetBonds():
>  if b.GetBondType() != Chem.BondType.SINGLE:
>  non_single_bonds.append(b)
>  for b in non_single_bonds:
>  j = b.GetBeginAtomIdx()
>  k = b.GetEndAtomIdx()
>  rw_mol.RemoveBond(j, k)
>  rw_mol.AddBond(j, k, Chem.BondType.SINGLE)
>  # as long as there are terminal atoms, remove them
>  terminal_atoms = find_terminal_atoms(rw_mol)
>  while terminal_atoms != []:
>  for a in terminal_atoms:
>  for b in a.GetBonds():
>  rw_mol.RemoveBond(b.GetBeginAtomIdx(),
> b.GetEndAtomIdx())
>  rw_mol.RemoveAtom(a.GetIdx())
>  terminal_atoms = find_terminal_atoms(rw_mol)
>  return rw_mol.GetMol()
> ---
>
> I don't claim this is very efficient Python code. I am not very good at
> snake charming.
>
> Regards,
> F.
>
> >> Best,
> >> -greg
> >>
> >> On Mon, Apr 26, 2021 at 11:15 AM Francois Berenger 
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> I am trying MurckoScaffold.MakeScaffoldGeneric(mol),
> >>> but this keeps the side chains.
> >>>
> >>> While my understanding of BM scaffolds is that only rings
> >>> and ring linkers should be kept.
> >>>
> >>> The fact that the rdkit implementation keeps the
> >>> side chains makes Murcko scaffolds a much less powerful filter
> >>> to enforce molecular diversity.
> >>>
> >>> And I don't even see any option to force the standard/vanilla
> >>> behavior.
> >>> Or, am I missing something?
> >>>
> >>> Regards,
> >>> F.
> >>>
> >>> ___
> >>> Rdkit-discuss mailing list
> >>> Rdkit-discuss@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Do we have an exact implementation of Bemis-Murcko scaffolds in rdkit?

2021-04-26 Thread Greg Landrum
Hi Francois,

The implementation which is there does, I believe, the right thing.
However... first you need to find the Murcko Scaffold, then you can convert
that scaffold to the generic form:

In [5]: m = Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1')
In [6]: scaff = MurckoScaffold.GetScaffoldForMol(m)
In [7]: Chem.MolToSmiles(scaff)
Out[7]: 'O=C(c1c1)C1CC1'
In [8]: framework = MurckoScaffold.MakeScaffoldGeneric(scaff)
In [9]: print(Chem.MolToSmiles(framework))
CC(C1C1)C1CC1



Best,
-greg


On Mon, Apr 26, 2021 at 11:15 AM Francois Berenger  wrote:

> Hello,
>
> I am trying MurckoScaffold.MakeScaffoldGeneric(mol),
> but this keeps the side chains.
>
> While my understanding of BM scaffolds is that only rings
> and ring linkers should be kept.
>
> The fact that the rdkit implementation keeps the
> side chains makes Murcko scaffolds a much less powerful filter
> to enforce molecular diversity.
>
> And I don't even see any option to force the standard/vanilla behavior.
> Or, am I missing something?
>
> Regards,
> F.
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ModuleNotFoundError: No module named 'rdkit'

2021-04-14 Thread Greg Landrum
Ok, you aren't having problems with the RDKit install or with the RDKit and
IPython; you are having problems with spyder + Ipython + the RDKit
It would have been very, very helpful if you had mentioned that at the
beginning so that I could have avoided wasting our time trying to diagnose
rdkit + ipython problems.

I don't use spyder, but the things I told you to look at while trying to
diagnose the rdkit+ipython problems may help you here. I won't be able to
help any more.

-greg


On Wed, Apr 14, 2021 at 6:25 PM Andrés Sánchez Ruiz <
andressanchezrui...@gmail.com> wrote:

> Dear Christos,
>
> Yes, I activate my enviroment and start spyder from there but I get the
> error.
>
> Thank you for answering.
>
> Best regards,
>
> Andrés
>
> El mié, 14 abr 2021 a las 18:20, Christos Kannas
> () escribió:
> >
> > Hi Andres,
> >
> > Maybe Spyder runs on the base conda environment.
> > Do you run Spyder from your activated environment?
> >
> > Kind regards,
> >
> > Christos
> >
> > On Wed, Apr 14, 2021, 17:52 Andrés Sánchez Ruiz <
> andressanchezrui...@gmail.com> wrote:
> >>
> >> Dear Greg,
> >>
> >> It works! It seems I can call functions of RDKit from this console,
> >> however, when I start spyder and try to run them I still get the
> >> error. Could it be something related to the spyder interpreter?
> >>
> >> Best regards,
> >>
> >> Andrés
> >>
> >> El mié, 14 abr 2021 a las 17:38, Greg Landrum
> >> () escribió:
> >> >
> >> > That looks good so far.
> >> > So what happens in that exact same shell if you then start ipython
> >> > and do "import rdkit"?
> >> >
> >> > -greg
> >> >
> >> >
> >> > On Wed, Apr 14, 2021 at 5:33 PM Andrés Sánchez Ruiz <
> andressanchezrui...@gmail.com> wrote:
> >> >>
> >> >> Dear Greg,
> >> >>
> >> >> After activating my enviroment (foodpains) I wrote the command "
> >> >> ipython -c 'import IPython;import
> >> >> rdkit;print(IPython.__file__,rdkit.__file__)' ". Right after getting
> >> >> the output I wrote: " where ipython ". This is what I get:
> >> >>
> >> >> (foodpains) C:\Users\Andres Sanchez>ipython -c "import IPython;import
> >> >> rdkit;print(IPython.__file__,rdkit.__file__)"
> >> >> C:\Anaconda\envs\foodpains\lib\site-packages\IPython\__init__.py
> >> >> C:\Anaconda\envs\foodpains\lib\site-packages\rdkit\__init__.py
> >> >>
> >> >> (foodpains) C:\Users\Andres Sanchez>where ipython
> >> >> C:\Anaconda\envs\foodpains\Scripts\ipython.exe
> >> >> C:\Anaconda\Scripts\ipython.exe
> >> >>
> >> >> Best regards,
> >> >>
> >> >> Andrés
> >> >>
> >> >> El mié, 14 abr 2021 a las 17:06, Greg Landrum
> >> >> () escribió:
> >> >> >
> >> >> > That looks good. Please send the output of:
> >> >> > ipython -c 'import IPython;import
> rdkit;print(IPython.__file__,rdkit.__file__)'
> >> >> >
> >> >> > and we also need to figure out exactly which version of ipython
> you are running.
> >> >> >
> >> >> > If you are running these commands in the command shell, that's
> >> >> > where ipython
> >> >> >
> >> >> > in powershell:
> >> >> > gcm ipython
> >> >> >
> >> >> > if you're using a bash shell:
> >> >> > which ipython
> >> >> >
> >> >> > Please run the ipython -c and which/where/gcm command directly
> after each other and paste in both the command you executed and its output.
> >> >> >
> >> >> > -greg
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Apr 14, 2021 at 4:46 PM Andrés Sánchez Ruiz <
> andressanchezrui...@gmail.com> wrote:
> >> >> >>
> >> >> >> Dear Greg,
> >> >> >>
> >> >> >> This is what I see after activating my enviroment (foodpains) and
> >> >> >> introducing your command:
> >> >> >>
> >> >> >> C:\Anaconda\envs\foodpains\lib\site-packages\IPython\__init__.py
> >

Re: [Rdkit-discuss] ModuleNotFoundError: No module named 'rdkit'

2021-04-14 Thread Greg Landrum
That looks good so far.
So what happens in that exact same shell if you then start ipython
and do "import rdkit"?

-greg


On Wed, Apr 14, 2021 at 5:33 PM Andrés Sánchez Ruiz <
andressanchezrui...@gmail.com> wrote:

> Dear Greg,
>
> After activating my enviroment (foodpains) I wrote the command "
> ipython -c 'import IPython;import
> rdkit;print(IPython.__file__,rdkit.__file__)' ". Right after getting
> the output I wrote: " where ipython ". This is what I get:
>
> (foodpains) C:\Users\Andres Sanchez>ipython -c "import IPython;import
> rdkit;print(IPython.__file__,rdkit.__file__)"
> C:\Anaconda\envs\foodpains\lib\site-packages\IPython\__init__.py
> C:\Anaconda\envs\foodpains\lib\site-packages\rdkit\__init__.py
>
> (foodpains) C:\Users\Andres Sanchez>where ipython
> C:\Anaconda\envs\foodpains\Scripts\ipython.exe
> C:\Anaconda\Scripts\ipython.exe
>
> Best regards,
>
> Andrés
>
> El mié, 14 abr 2021 a las 17:06, Greg Landrum
> () escribió:
> >
> > That looks good. Please send the output of:
> > ipython -c 'import IPython;import
> rdkit;print(IPython.__file__,rdkit.__file__)'
> >
> > and we also need to figure out exactly which version of ipython you are
> running.
> >
> > If you are running these commands in the command shell, that's
> > where ipython
> >
> > in powershell:
> > gcm ipython
> >
> > if you're using a bash shell:
> > which ipython
> >
> > Please run the ipython -c and which/where/gcm command directly after
> each other and paste in both the command you executed and its output.
> >
> > -greg
> >
> >
> >
> >
> > On Wed, Apr 14, 2021 at 4:46 PM Andrés Sánchez Ruiz <
> andressanchezrui...@gmail.com> wrote:
> >>
> >> Dear Greg,
> >>
> >> This is what I see after activating my enviroment (foodpains) and
> >> introducing your command:
> >>
> >> C:\Anaconda\envs\foodpains\lib\site-packages\IPython\__init__.py
> >> C:\Anaconda\envs\foodpains\lib\site-packages\rdkit\__init__.py
> >>
> >> Best regards,
> >>
> >> Andrés
> >>
> >> El mié, 14 abr 2021 a las 15:42, Greg Landrum
> >> () escribió:
> >> >
> >> > What do you see when you execute this quick test to ensure that
> ipython and the rdkit are both really installed?
> >> >
> >> > python -c 'import IPython;import
> rdkit;print(IPython.__file__,rdkit.__file__)'
> >> >
> >> > -greg
> >> >
> >> > On Wed, Apr 14, 2021 at 2:58 PM Andrés Sánchez Ruiz <
> andressanchezrui...@gmail.com> wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> I have not been able to solve the issue yet after installing ipython
> >> >> in the same enviroment in which I have RDKIT.
> >> >>
> >> >> ipython   7.22.0   py39hd4e2768_0
> >> >> ipython_genutils  0.2.0  pyhd3eb1b0_1
> >> >> .
> >> >> .
> >> >> .
> >> >> rdkit 2021.03.1py39hfadf033_0
> conda-forge
> >> >>
> >> >> From this enviroment I can call pandas (for example) but not RDKIT.
> >> >> What is still not working?
> >> >>
> >> >> Best regards,
> >> >>
> >> >> Andrés
> >> >>
> >> >>
> >> >> ___
> >> >> Rdkit-discuss mailing list
> >> >> Rdkit-discuss@lists.sourceforge.net
> >> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ModuleNotFoundError: No module named 'rdkit'

2021-04-14 Thread Greg Landrum
That looks good. Please send the output of:
ipython -c 'import IPython;import
rdkit;print(IPython.__file__,rdkit.__file__)'

and we also need to figure out exactly which version of ipython you are
running.

If you are running these commands in the command shell, that's
where ipython

in powershell:
gcm ipython

if you're using a bash shell:
which ipython

Please run the ipython -c and which/where/gcm command directly after each
other and paste in both the command you executed and its output.

-greg




On Wed, Apr 14, 2021 at 4:46 PM Andrés Sánchez Ruiz <
andressanchezrui...@gmail.com> wrote:

> Dear Greg,
>
> This is what I see after activating my enviroment (foodpains) and
> introducing your command:
>
> C:\Anaconda\envs\foodpains\lib\site-packages\IPython\__init__.py
> C:\Anaconda\envs\foodpains\lib\site-packages\rdkit\__init__.py
>
> Best regards,
>
> Andrés
>
> El mié, 14 abr 2021 a las 15:42, Greg Landrum
> () escribió:
> >
> > What do you see when you execute this quick test to ensure that ipython
> and the rdkit are both really installed?
> >
> > python -c 'import IPython;import
> rdkit;print(IPython.__file__,rdkit.__file__)'
> >
> > -greg
> >
> > On Wed, Apr 14, 2021 at 2:58 PM Andrés Sánchez Ruiz <
> andressanchezrui...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I have not been able to solve the issue yet after installing ipython
> >> in the same enviroment in which I have RDKIT.
> >>
> >> ipython   7.22.0   py39hd4e2768_0
> >> ipython_genutils  0.2.0  pyhd3eb1b0_1
> >> .
> >> .
> >> .
> >> rdkit 2021.03.1py39hfadf033_0conda-forge
> >>
> >> From this enviroment I can call pandas (for example) but not RDKIT.
> >> What is still not working?
> >>
> >> Best regards,
> >>
> >> Andrés
> >>
> >>
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ModuleNotFoundError: No module named 'rdkit'

2021-04-14 Thread Greg Landrum
What do you see when you execute this quick test to ensure that ipython and
the rdkit are both really installed?

python -c 'import IPython;import
rdkit;print(IPython.__file__,rdkit.__file__)'

-greg

On Wed, Apr 14, 2021 at 2:58 PM Andrés Sánchez Ruiz <
andressanchezrui...@gmail.com> wrote:

> Hello,
>
> I have not been able to solve the issue yet after installing ipython
> in the same enviroment in which I have RDKIT.
>
> ipython   7.22.0   py39hd4e2768_0
> ipython_genutils  0.2.0  pyhd3eb1b0_1
> .
> .
> .
> rdkit 2021.03.1py39hfadf033_0conda-forge
>
> From this enviroment I can call pandas (for example) but not RDKIT.
> What is still not working?
>
> Best regards,
>
> Andrés
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ModuleNotFoundError: No module named 'rdkit'

2021-04-07 Thread Greg Landrum
Hi Andrés,

The typical reason for this problem is that you created a separate
environment for the RDKit and installed the package there, but forgot to
install ipython. When this happens ipython is run from the base environment
and can't find the rdkit. Can you please confirm that you have ipython
installed in the same environment that you installed the RDKit itself in?

-greg

On Wed, Apr 7, 2021 at 4:38 PM Andrés Sánchez Ruiz <
andressanchezrui...@gmail.com> wrote:

> To whom it may concern,
>
> I am having some trouble with the rdkit installation, the error I get is
> the following:
> > import rdkit
> Traceback (most recent call last):
>
>   File "", line 1, in 
> import rdkit
>
> ModuleNotFoundError: No module named 'rdkit'
>
> However, when I check in my enviroment I can see the module installed:
> rdkit 2021.03.1py39hfadf033_0conda-forge
>
> I followed both the guide offered in your page:
> https://www.rdkit.org/docs/Install.html and two other videos on youtube
> that describe the procedure: https://www.youtube.com/watch?v=3JywpzUKon8
> and https://www.youtube.com/watch?v=UmW9Cr8uF5g which are slighlty
> different.
> I have Windows 10 installed in this computer and python version 3.8.5 (the
> one that comes with anaconda). If you needed any further information that I
> am missing, please, let know.
>
> Thank in advance,
>
> Best regards,
>
> Andrés
>
> P.D. I first installed anaconda with the path option, then I uninstalled
> to see if such was the source of the error but still got it.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit - contributing conformational entropy descriptor

2021-03-29 Thread Greg Landrum
Hi Geoff,

Congrats to you and co-authors on the paper and thanks for offering to
contribute to the RDKit.

Given the dependence on py_rdl, this probably isn't currently a good match
for the core RDKit (we try to minimize adding additional external
dependencies), but it would be great to have it available in the RDKit
Contrib directory. The only real requirement there is to have a README.md
file describing what the contribution is and linking to the original
publication.
You can take a look at a couple of recent contributions to use as examples:
https://github.com/rdkit/rdkit/tree/master/Contrib/CalcLigRMSD
https://github.com/rdkit/rdkit/tree/master/Contrib/NIBRSubstructureFilters

A quick aside on RDL: the RDKit can optionally use the RDL library, but it
looks like it doesn't currently expose everything RingEntropy.py needs to
Python. If the code is available in Contrib, we can take a look at doing a
C++ re-implementation of the new descriptor(s) and putting that in the
core. The advantage of the C++ implementation is that it would make
the descriptor more broadly accessible. We've done this a couple of times
in the past with code from Contrib.

Best,
-greg

On Mon, Mar 29, 2021 at 9:37 PM Geoffrey Hutchison <
geoff.hutchi...@gmail.com> wrote:

> Hi Greg,
>
> We just published a paper with a linear model predicting conformational
> entropies:
> https://pubs.acs.org/doi/10.1021/acs.jctc.0c01213
> https://github.com/hutchisonlab/molecular-entropies
>
> We built the notebooks on top of RDKit - and of course would like to
> contribute the descriptor back to RDKit.
>
> Is there a contribution guide (e.g., how to structure the code before a
> pull request)? In particular, this one has a few components that might
> prove useful as separate calls (e.g., the ring flexibility measure).
>
> Thanks,
> -Geoff
>
> ---
> Prof. Geoffrey Hutchison
> Department of Chemistry
> University of Pittsburgh
> tel: (412) 648-0492
> email: geo...@pitt.edu
> twitter: @ghutchis
> web: https://hutchison.chem.pitt.edu/
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [Rdkit-announce] 2021.03.1 RDKit Release

2021-03-28 Thread Greg Landrum
Hi Drew,

Thanks for pointing out the problem. I had inadvertently done the conda
builds using freetype, but I forgot to add a freetype dependency.
It should be fixed now. Note: removed the old builds and uploaded new ones,
so you'll probably need to do a conda uninstall and then conda install
again.

-greg


On Sat, Mar 27, 2021 at 8:09 PM Drew Gibson 
wrote:

> Hi,
>
> just a heads-up that I'm seeing the following error on MacOS on trying to
> create the rdkit extension in a chembl_28 db.
>
> chembl_28=# create extension if not exists rdkit;
> ERROR:  could not load library
> "/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so":
> dlopen(/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so, 10):
> Library not loaded: /usr/local/opt/freetype/lib/libfreetype.6.dylib
>   Referenced from: /Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so
>   Reason: image not found
> chembl_28=#
>
> I'm using a Mac Mini 2018 with Big Sur version 11.2.3.  I created my conda
> environment using the postgresql=12.2 option - haven't tried the others but
> then the issue doesn't seem related to postgresql.
>
> I subsequently installed the 2020.03.3 version with postgresql=12.2 and
> have had no problems.
>
> Thanks !
>
> Drew
>
>
> On Fri, 26 Mar 2021 at 15:14, Greg Landrum  wrote:
>
>> Dear all,
>>
>> I'm pleased to announce that the 2021.03 version of the RDKit is
>> released. We actually managed to get the .03 release done during March.
>> Shocking! ;-)
>> The release notes are below.[1]
>>
>> The release files are on the github release page:
>> https://github.com/rdkit/rdkit/releases/tag/Release_2021_03_1
>> The DOI for this release is:
>> https://doi.org/10.5281/zenodo.4639022
>>
>> I do not plan to do conda builds for the Python wrappers in the rdkit
>> channel for this release. The builds done as part of the conda-forge
>> project are automated and cover more Python versions and operating systems
>> than I could ever hope to do manually.
>> Please install the rdkit using conda-forge:
>> conda install -c conda-forge rdkit
>> I believe that the conda-forge builds of the new version should appear
>> over the next couple of days.
>>
>> I hope to finish the conda builds of the PostgreSQL cartridge for linux
>> and the mac and have them available in the rdkit channel by later today
>> or tomorrow.
>>
>> The online version of the documentation at rdkit.org (
>> http://rdkit.org/docs/index.html) has been updated.
>>
>> Thanks to everyone who submitted code, bug reports, and suggestions for
>> this release!
>>
>> Please let me know if you find any problems with the release or have
>> suggestions for the next one, which is scheduled for September/October 2021.
>>
>> Best Regards,
>> -greg
>> [1] We probably should figure out some way to make the release notes a
>> bit less verbose. ;-)
>>
>>
>> # Release_2021.03.1
>> (Changes relative to Release_2020.09.1)
>>
>> ## Backwards incompatible changes
>> - The distance-geometry based conformer generation now by defaults
>> generates
>>   trans(oid) conformations for amides, esters, and related structures.
>> This can
>>   be toggled off with the `forceTransAmides` flag in EmbedParameters.
>> Note that
>>   this change does not impact conformers created using one of the ET
>> versions.
>>   (#3794)
>> - The conformer generator now uses symmetry by default when doing RMS
>> pruning.
>>   This can be disabled using the `useSymmetryForPruning` flag in
>>   EmbedParameters. (#3813)
>> - Double bonds with unspecified stereochemistry in the products of
>> chemical
>>   reactions now have their stereo set to STEREONONE instead of STEREOANY
>> (#3078)
>> - The MolToSVG() function has been moved from rdkit.Chem to
>> rdkit.Chem.Draw
>>   (#3696)
>> - There have been numerous changes to the RGroup Decomposition code which
>> change
>>   the results. (#3767)
>> - In RGroup Decomposition, when onlyMatchAtRGroups is set to false, each
>> molecule
>>   is now decomposed based on the first matching scaffold which adds/uses
>> the
>>   least number of non-user-provided R labels, rather than simply the first
>>   matching scaffold.
>>   Among other things, this allows the code to provide the same results
>> for both
>>   onlyMatchAtRGroups=true and onlyMatchAtRGroups=false when suitable
>> scaffolds
>>   are provided without requiring the user to get overly concerned about
>> the
>>   input ordering of the scaffolds. (#3

Re: [Rdkit-discuss] 2021.03.1 RDKit Release

2021-03-26 Thread Greg Landrum
Apologies, I forgot to remove the beta tag in a file in the release, so
I've deleted that tag and done a new one.
The new DOI is:
https://zenodo.org/record/4639764

Everything else remains the same.

sorry for the noise,
-greg
p.s. hopefully there are no other silly mistakes/oversights...

On Fri, Mar 26, 2021 at 4:13 PM Greg Landrum  wrote:

> Dear all,
>
> I'm pleased to announce that the 2021.03 version of the RDKit is released.
> We actually managed to get the .03 release done during March. Shocking! ;-)
> The release notes are below.[1]
>
> The release files are on the github release page:
> https://github.com/rdkit/rdkit/releases/tag/Release_2021_03_1
> The DOI for this release is:
> https://doi.org/10.5281/zenodo.4639022
>
> I do not plan to do conda builds for the Python wrappers in the rdkit
> channel for this release. The builds done as part of the conda-forge
> project are automated and cover more Python versions and operating systems
> than I could ever hope to do manually.
> Please install the rdkit using conda-forge:
> conda install -c conda-forge rdkit
> I believe that the conda-forge builds of the new version should appear
> over the next couple of days.
>
> I hope to finish the conda builds of the PostgreSQL cartridge for linux
> and the mac and have them available in the rdkit channel by later today
> or tomorrow.
>
> The online version of the documentation at rdkit.org (
> http://rdkit.org/docs/index.html) has been updated.
>
> Thanks to everyone who submitted code, bug reports, and suggestions for
> this release!
>
> Please let me know if you find any problems with the release or have
> suggestions for the next one, which is scheduled for September/October 2021.
>
> Best Regards,
> -greg
> [1] We probably should figure out some way to make the release notes a bit
> less verbose. ;-)
>
>
> # Release_2021.03.1
> (Changes relative to Release_2020.09.1)
>
> ## Backwards incompatible changes
> - The distance-geometry based conformer generation now by defaults
> generates
>   trans(oid) conformations for amides, esters, and related structures.
> This can
>   be toggled off with the `forceTransAmides` flag in EmbedParameters. Note
> that
>   this change does not impact conformers created using one of the ET
> versions.
>   (#3794)
> - The conformer generator now uses symmetry by default when doing RMS
> pruning.
>   This can be disabled using the `useSymmetryForPruning` flag in
>   EmbedParameters. (#3813)
> - Double bonds with unspecified stereochemistry in the products of chemical
>   reactions now have their stereo set to STEREONONE instead of STEREOANY
> (#3078)
> - The MolToSVG() function has been moved from rdkit.Chem to rdkit.Chem.Draw
>   (#3696)
> - There have been numerous changes to the RGroup Decomposition code which
> change
>   the results. (#3767)
> - In RGroup Decomposition, when onlyMatchAtRGroups is set to false, each
> molecule
>   is now decomposed based on the first matching scaffold which adds/uses
> the
>   least number of non-user-provided R labels, rather than simply the first
>   matching scaffold.
>   Among other things, this allows the code to provide the same results for
> both
>   onlyMatchAtRGroups=true and onlyMatchAtRGroups=false when suitable
> scaffolds
>   are provided without requiring the user to get overly concerned about the
>   input ordering of the scaffolds. (#3969)
> - There have been numerous changes to
> `GenerateDepictionMatching2DStructure()` (#3811)
> - Setting the kekuleSmiles argument (doKekule in C++) to MolToSmiles will
> now
>   cause the molecule to be kekulized before SMILES generation. Note that
> this
>   can lead to an exception being thrown. Previously this argument would
> only
>   write kekulized SMILES if the molecule had already been kekulized (#2788)
> - Using the kekulize argument in the MHFP code will now cause the molecule
> to be
>   kekulized before the fingerprint is generated. Note that becaues
> kekulization
>   is not canonical, using this argument currently causes the results to
> depend
>   on the input atom numbering. Note that this can lead to an exception
> being
>   thrown. (#3942)
> - Gradients for angle and torsional restraints in both UFF and MMFF were
> computed
>   incorrectly, which could give rise to potential instability during
> minimization.
>   As part of fixing this problem, force constants have been switched to
> using
>   kcal/degree^2 units instead of kcal/rad^2 units, consistently with the
> fact that
>   angle and dihedral restraints are specified in degrees. (#3975)
>
> ## Highlights
> - MolDraw2D now does a much better job of handling query features like
> common
>   quer

  1   2   3   4   5   6   7   8   9   10   >