[Rdkit-discuss] canonicalization of two aromatic molecules returning two different forms (kekule and aromatic)

2020-11-27 Thread Alexis Parenty
Hi everyone, Why is it that when I canonicalize the following smiles_1 I get its unexpected kekule form, whereas when I canonicalize a similar smiles_2, I get its expected aromatic form? >From rdkit import Chem smiles1 = Chem.CanonSmiles("N12C=CC=CC1=NCC2") smiles ==> 'C1=CC2=NCCN2C=C1' smiles2

Re: [Rdkit-discuss] canonicalization of two aromatic molecules returning two different forms (kekule and aromatic)

2020-11-27 Thread Alexis Parenty
=C1C=CC=CN1C") > > In [7]: > > mol2 > > Out[7]: > [image: image.png] > In [8]: > > smiles2 = Chem.MolToSmiles(mol2) > > In [9]: > > smiles2 > > Out[9]: > > 'CN=c1n1C' > > > > > > > > > > In [10]: >

Re: [Rdkit-discuss] canonicalization of two aromatic molecules returning two different forms (kekule and aromatic)

2020-11-28 Thread Alexis Parenty
he very least unexpected >>> behaviour) in the aromaticity code. The issue isn’t the aromaticity of the >>> imidazole/dihydroimidazole, but the aromaticity of the pyridyl. Alexis’ >>> second molecule is identical to the first except that one bond in the >>> 5-m

Re: [Rdkit-discuss] canonicalization of two aromatic molecules returning two different forms (kekule and aromatic)

2020-12-04 Thread Alexis Parenty
Dear Rdkiters, I could not pickle my models when using the new chemical descriptors, therefore, I have re-installed the latest patched version of rdkit 2020.09.02 through conda-forge. (ref Fixes #3511 #3513) https://anaconda.org/conda-forge/rdkit However, this new version still does not let me p

Re: [Rdkit-discuss] canonicalization of two aromatic molecules returning two different forms (kekule and aromatic)

2020-12-04 Thread Alexis Parenty
The 2020.09.01 is just an oversight; you are indeed the 2020.09.02 > version. > > Cheers, > p. > > On Fri, Dec 4, 2020 at 1:32 PM Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> Dear Rdkiters, >> >> I could not pickle my models when

[Rdkit-discuss] cross-platform issues with rdkit 2020.09.03 ?

2021-02-19 Thread Alexis Parenty
Dear Rdkiters, I use the last version of rdkit (2020.09.03) on both Windows and Linux platforms, from identical Conda Python 3.9 environment. I have noticed cross-platform incompatibility issues after having built a ML model from my Linux machine and tried to run the model from my Windows machine.

Re: [Rdkit-discuss] cross-platform issues with rdkit 2020.09.03 ?

2021-02-19 Thread Alexis Parenty
Landrum wrote: > Hi Alexis, > > How did you install the rdkit on these machines? > I see len(Descriptors._descList)=208 on both windows and linux using the > conda-forge rdkit builds for v2020.09.3 and v2020.09.4 > > -greg > > > On Fri, Feb 19, 2021 at 3:04 PM Alexi

[Rdkit-discuss] Cross platform inconsistency with the Descriptor module

2021-09-08 Thread Alexis Parenty
Hi everyone, I have noticed some inconsistencies with the list of rdkit chemical descriptor available between my Windows machine and my Linux machine. I am running the same rdkit version on both platforms (2021.03.1) on the same 3.9 python version. running the following from windows: print(len(

Re: [Rdkit-discuss] Cross platform inconsistency with the Descriptor module

2021-09-08 Thread Alexis Parenty
come from the conda-forge channel and was built differently from the one > you have installed on Linux. > > Cheers, > p. > > On Wed, Sep 8, 2021 at 11:30 AM Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> Hi everyone, >> >> I have noti

Re: [Rdkit-discuss] Cross platform inconsistency with the Descriptor module

2021-09-09 Thread Alexis Parenty
t that the fr_sulfone entry in >> the CSV file is duplicate. >> >> If the CSV file existed and were empty no fragment descriptors would be >> available. >> Similarly, if the file did not exist (maybe because RDConfig.RDDataDir >> is misconfigured), no fragmen

[Rdkit-discuss] HasSubstructMatch method with useChirality argument

2022-01-04 Thread Alexis Parenty
Hi everyone, Why is it that the following smarts C[C@]1CCCN(C1)C(C)=O does not match the following structure C[C@]1(CCCN(C1)C(C)=O)N when using the chirality argument in the HasSubstructMatch method? mol_frag = Chem.MolFromSmarts("C[C@]1CCCN(C1)C(C)=O") mol_structure = Chem.MolFromSmiles("C[C@]1

Re: [Rdkit-discuss] HasSubstructMatch method with useChirality argument

2022-01-04 Thread Alexis Parenty
g S and R stereoisomer and hence it is false while in the second > you are comparing R and R. > best, > Shani > > [image: q.gif] > > On Tue, Jan 4, 2022 at 1:22 PM Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> Hi everyone, >> >> Why is it t

Re: [Rdkit-discuss] HasSubstructMatch method with useChirality argument

2022-01-04 Thread Alexis Parenty
; what is specified by the user in the SMARTS string, e.g.: > > mol_frag = Chem.MolFromSmarts("[C@@]") > print("mol_frag", mol_frag.GetAtomWithIdx(0).GetChiralTag()) > mol_frag CHI_TETRAHEDRAL_CW > > mol_frag = Chem.MolFromSmarts("[C@]") > print("mo

[Rdkit-discuss] rdSubstructLibrary and atom indexes involved in substructure matches

2022-01-12 Thread Alexis Parenty
Hi everyone, Is it possible to get more information on the atom idexes matching fragments when using the module rdSubstructLibrary? I use GetMatches to have the row indexes of the dataset that match a particular substructure but when there is a match, I would also want to know the atom idexes of

Re: [Rdkit-discuss] rdSubstructLibrary and atom indexes involved in substructure matches

2022-01-14 Thread Alexis Parenty
gt; get the atom indices should not be a major performance problem. > > Cheers, > p. > > On Wed, Jan 12, 2022 at 4:58 PM Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> Hi everyone, >> >> Is it possible to get more information on the atom idexes mat

[Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Alexis Parenty
Dear all, I am looking for a way to extract SMILES scattered in many text documents (thousands documents of several pages each). At the moment, I am thinking to scan each words from the text and try to make a mol object from them using Chem.MolFromSmiles() then store the words if they return a m

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Alexis Parenty
some false positives > like "CC" which may occur in text (emails especially). > > > Pozdrawiam, | Best regards, > Maciek Wójcikowski > mac...@wojcikowski.pl > > 2016-12-02 10:11 GMT+01:00 Alexis Parenty : > >> Dear all, >> >> >> I am

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Alexis Parenty
occur in text (emails especially). >> >> >> Pozdrawiam, | Best regards, >> Maciek Wójcikowski >> mac...@wojcikowski.pl >> >> 2016-12-02 10:11 GMT+01:00 Alexis Parenty >> : >> >>> Dear all, >>> >>> >&g

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Alexis Parenty
complexity as well. > > Pavel. > On 12/02/2016 11:11 AM, Greg Landrum wrote: > > An initial start on some regexps that match SMILES is here: > https://gist.github.com/lsauer/1312860/264ae813c2bd2c27a769d261c8c6b3 > 8da34e22fb > > that may also be useful > &

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Alexis Parenty
Dear All, Many thanks to everyone for your participation in that discussion. It was very interesting and useful. I have written a small script that took on board everyone’s input: This incorporates a few "text filters" before the RDKit function: First of all I made a dictionary of all the words p

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Alexis Parenty
c1) and the like, just reasserting that this is a hard > problem! > > > Brian Kelley > > On Dec 5, 2016, at 5:35 AM, Alexis Parenty > wrote: > > Dear All, > > Many thanks to everyone for your participation in that discussion. It was > very interesting an

[Rdkit-discuss] Drawing structure with generic labels

2017-02-16 Thread Alexis Parenty
Hi everyone, Is it possible to draw a structure from a SMARTS that contain generic label? The following is a valid SMARTS for a structure with an undefined heteroatom [N,O,P,S] and an undefined halogen [F,Cl,Br,I]: [#7,#8,#15,#16]=C(CC1=CC([#9,#17,#35,#53])=CC=C1)[#7,#8,#15,#16] [image: Inl

[Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-17 Thread Alexis Parenty
Hi everyone, I am looking for substructure match between a smarts and a smiles, but I want any heteroatom from the smarts to match any heteroatom from a smiles: [image: Inline images 1] The following does not return what I would expect: smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC

Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-17 Thread Alexis Parenty
Kekule notation. > Best, > Michal > > On 17 May 2017 at 12:55, Alexis Parenty > wrote: > >> Hi everyone, >> >> I am looking for substructure match between a smarts and a smiles, but I >> want any heteroatom from the smarts to match any heteroatom

Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-18 Thread Alexis Parenty
matic False > > Halogen.Bromine.BromoKetone False > > Halogen.NotFluorine True > > Halogen.NotFluorine.Aliphatic False > > Halogen.NotFluorine.Aromatic True > > Isocyanate False > > Isocyanate.Aliphatic False > > Isocyanate.Aromatic False > > Nitro False > > Nitro.Al

[Rdkit-discuss] How to transform SMARTS of aromatic structures so that their aromatic atoms could be any?

2017-05-19 Thread Alexis Parenty
Hi everyone, I need a function that could generalize any aromatic rings from a SMARTS: [image: Inline images 1] I have noticed that it is possible to rearrange most of SMARTS strings into a general aromatic SMARTS strings by following those simple rules: 1 Exchange any low

Re: [Rdkit-discuss] How to transform SMARTS of aromatic structures so that their aromatic atoms could be any?

2017-05-19 Thread Alexis Parenty
if the chemistry is correct.* > > Best, > > Christos > > Christos Kannas > > Researcher > Ph.D Student > > [image: View Christos Kannas's profile on LinkedIn] > <http://cy.linkedin.com/in/christoskannas> > > On 19 May 2017 at 12:52, Alexis Parenty >

[Rdkit-discuss] Memory issue when storing more than 300K mol in a list

2017-06-04 Thread Alexis Parenty
Dear RDKit community, I need to screen for substructure relationships between two sets of structures (1 000 X 500 000): I thought I should build two lists of mol objects from SMILES, but I keep having a memory error when the second list reaches 300 000 mol. All my RAM (12G) gets consumed along wit

Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list

2017-06-09 Thread Alexis Parenty
): >...: if m is None: >...: continue >...: matches.append([m.HasSubstructMatch(q) for q in queries]) >...: > > > > Brian has some thoughts on making this particular use case easier/faster > (in particular by adding multi-threading support), so maybe ther

Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list

2017-06-09 Thread Alexis Parenty
None: > continue > matches.append([m.HasSubstructMatch(q) for q in queries]) > > > > The second form consumes a lot more memory without delivering any > improvement in performance. > > Best, > -greg > > > On Fri, Jun 9, 2017 at 3:33 PM, Alexis Parenty <

[Rdkit-discuss] Chem.MolToSmarts(mol) out put

2017-07-14 Thread Alexis Parenty
Dear Rdkiters, I sometimes get smarts from mol in atomic number notation such as: [#6]-[#7+]1=[#6]2-[#6]3:[#7]:[#6]:[#6]:[#6]:[#6]:3-[#6]3:[#6]:[#6]:[#6]:[#6]:[#6]:3-[#7]-2-[#6]-[#6]-1 Is there a way to force the method Chem.MolToSmarts(mol) to output a smarts using alphabetic letters instead

[Rdkit-discuss] "Markush SMARTS" ?

2017-09-29 Thread Alexis Parenty
Dear rdkiters, I am interested to capture in a single SMARTS notation aromatic systems with several possible substitution positions (ortho, meta, para). Is there a way using rdkit to covert for example the three structures under in a SMARTS notation that would match the three structures when I d

[Rdkit-discuss] How to generate bioisosters?

2018-02-05 Thread Alexis Parenty
Dear RDKiters, I would like to generate the bioisosters of amides from a large list of structures: The smarts patterns for the bioisosters of amides I am interested in is: smarts_path = ['C1=CN=[CH1][CH1]=N1', 'C1=[CH1]N=C[CH1]=N1', 'C1=[CH1]N=[CH1]C=N1', 'OC1=[CH1]C=NO1', 'OC1=NOC=[CH1]1', 'C1

Re: [Rdkit-discuss] How to generate bioisosters?

2018-02-06 Thread Alexis Parenty
m.MolFromSmiles('C1CC1C(=O)Nc1c > 1'),)) > > In [13]: Chem.MolToSmiles(ps[0][0]) > Out[13]: 'c1ccc(-n2cc(C3CC3)nn2)cc1' > > > Notice that I added the bioisostere itself to the products of the reaction > as SMILES. You don't want query features

Re: [Rdkit-discuss] How to generate bioisosters?

2018-02-06 Thread Alexis Parenty
gt; On Tue, Feb 6, 2018 at 10:42 AM, Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> I will try your approach and will nest all the result smiles into a >> unique recursive smiles. >> > I'm not quite sure what you mean here, but it sounds unlikely

Re: [Rdkit-discuss] issue during parsing a smile

2018-04-16 Thread Alexis Parenty
Hi Guillaume, you have a trivalent oxygen here in your heterocycle "O2O" Best, Alexis On 16 April 2018 at 16:29, Guillaume GODIN wrote: > Dear Andrew, > > Thank you! > > And for this one C[C@@]12CC[C@@](C)(CC1)O2O any idea > > Cause your tool failed too. > > BR, > > Guillaume > > Le 16.04.18

[Rdkit-discuss] optimizing substructure search

2018-08-18 Thread Alexis Parenty
Dear rdkiter, I’d like to optimize an algorithm that is slow due to substructure searches. I am doing several millions of substructure searches using mol1.HasSubstructurMatch(mol2). I have hundreds of mol1s and millions of mol2s. Most of the time mol2 is not a substructure of mol1 so I was thinki

Re: [Rdkit-discuss] Fingerprints standardization and cleaning

2018-09-14 Thread Alexis Parenty
Hi Mario, I use the python library MolVS (from Matt Swain, 2016) https://molvs.readthedocs.io/en/latest/ Features - Normalization of functional groups to a consistent format. - Recombination of separated charges. - Breaking of bonds to m

Re: [Rdkit-discuss] smarts substructure query match = FALSE?

2018-09-18 Thread Alexis Parenty
Hi Steeve, you have an imine bond in your smarts instead of an aromatic bond. It cannot match. [image: image.png] Best, Alexis On Tue, 18 Sep 2018 at 15:30, Stephen O'hagan wrote: > Hi folks, > > > > This looks as if HasSubstructMatch should return TRUE, so why is it FALSE? > [Python 3.6, RDK

[Rdkit-discuss] Inchi/smiles conversion issue

2019-06-18 Thread Alexis Parenty
Dear RdKiters, Why is it that the stable tautomer of the following structure is lost during inchi/smiles conversion? [image: image.png] mol = Chem.MolFromSmiles(*"Cc1ccc([nH]nc2)c2c1"*) inchi = Chem.MolToInchi(mol) mol = Chem.MolFromInchi(inchi) smiles = Chem.MolToSmiles(mol) print(smiles) *==

Re: [Rdkit-discuss] Inchi/smiles conversion issue

2019-06-18 Thread Alexis Parenty
ibute for the inchi. But beware this makes it a non standard > Inchi, and thus might not be comparable to other Inchis. > > Hope this helps, > > Jennifer > On 18.06.19 12:59, Alexis Parenty wrote: > > Dear RdKiters, > > Why is it that the stable tautomer of the

[Rdkit-discuss] Highlighting some parts of a structure

2020-01-22 Thread Alexis Parenty
Hi everyone, I use SimilarityMaps.GetSimilarityMapFromWeights(mol, atom_ids) to highlight some parts of a structure, but is it also possible to change the thickness of some bonds of a structure knowing their atom IDs? If selected bonds cannot be bold, can we change their color? Many thanks and re

Re: [Rdkit-discuss] Highlighting some parts of a structure

2020-01-23 Thread Alexis Parenty
> directly changed is interesting... > > -greg > > > > > > On Wed, Jan 22, 2020 at 6:17 PM Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> Hi everyone, >> >> I use SimilarityMaps.GetSimilarityMapFromWeights(mol, atom_ids) to &g

[Rdkit-discuss] Smarts notation for aromatic regioisomers

2020-01-29 Thread Alexis Parenty
Hi everyone, Is there a way to get a substructure match of regioisomers using a smarts by separating the fragments with “.”: The following approach works but is too permissive since it will also match structures with a bromide or a chloride linked to a aliphatic carbon... [image: image.png]

Re: [Rdkit-discuss] Smarts notation for aromatic regioisomers

2020-01-29 Thread Alexis Parenty
ed fragments are not going to work for this, as you >> describe. You need to use recursive SMARTS (see >> https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html section >> 4.4). Something like: >> Clc[$(cBr);$(ccBr);$(cccBr)] >> should (I hope!) be a reason

Re: [Rdkit-discuss] Smarts notation for aromatic regioisomers

2020-01-30 Thread Alexis Parenty
a "matching atom" > is the first atom in each of the query molecules. > > Best > -greg > [1] plus it was kind of fun to think about and put together. > > > > > On Wed, Jan 29, 2020 at 7:21 PM Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: >

[Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Alexis Parenty
Dear Rdkiters, I am interested in doing substructure searches between many thousands structures and many thousands of fragments, as quickly as possible, with reasonable accuracy (> 0.95)... I did read Greg's excellent post on that subject: http://rdkit.blogspot.com/2019/07/a-couple-of-substructu

Re: [Rdkit-discuss] Doing substructure search as quickly as possible...

2020-02-10 Thread Alexis Parenty
ed usage and code snippets > you can find on RDKit blog post that Greg has put together here: > https://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html > > Best, > Maciek > > > Pozdrawiam, | Best regards, > Maciek Wójcikowski > mac...@wojcikowski

[Rdkit-discuss] Substructure match when using smarts containing more than one part

2020-02-25 Thread Alexis Parenty
Dear RDkiter, Using HasSubstructureMatch() I can match the following smarts “F[a]” and “c1c1” with "Fc1c1”. However, when I put the two fragments together in "F[a].c1c1" it no longer matches. I suppose this is the desired behaviour since the any aromatic [a] from F[a] that is also part

Re: [Rdkit-discuss] Substructure match when using smarts containing more than one part

2020-02-25 Thread Alexis Parenty
is Fa.[$(c1c1)] > > -greg > > On Tue, Feb 25, 2020 at 8:46 AM Alexis Parenty < > alexis.parenty.h...@gmail.com> wrote: > >> Dear RDkiter, >> >> Using HasSubstructureMatch() I can match the following smarts “F[a]” and >> “c1c1” with "Fc1