Re: [Rdkit-discuss] Problem getting valence

2018-07-26 Thread Lewis Martin
Thankyou Paolo and Chris! These hydrogens were added while editing the molecule beforehand but I assumed sanitizing would remove them. Cheers Lewis On Thu, 26 Jul 2018 at 7:59 pm, Chris Earnshaw wrote: > Hi > > It looks to me like N5 [nH:5] also has a problem. This has 3 connections > to heavy

[Rdkit-discuss] Pharmacophore atom typing for torsion or atom pair FP

2019-01-29 Thread Lewis Martin
Hi rdkitters, I'd like to compare the similarity of torsion/atom pair FPs using standard atomic numbering with those using pharmacophore types, like the 'CATS' atom typing developed by Gisbert Schneider, and hoped someone has some advice here. *CATS* is a pharmacophore atom typing system with

Re: [Rdkit-discuss] Pharmacophore atom typing for torsion or atom pair FP

2019-01-31 Thread Lewis Martin
uman > readable/understandable string rather than some (obscure) integer. > > I am interested to look at the atom types used by the ECFP > and the FCFP fingerprints. > > Thanks a lot, > Francois. > > On 31/01/2019 08:49, Lewis Martin wrote: > > Thanks so much Greg! >

Re: [Rdkit-discuss] Pharmacophore atom typing for torsion or atom pair FP

2019-01-30 Thread Lewis Martin
f days. > > -greg > > > On Wed, Jan 30, 2019 at 4:59 AM Lewis Martin > wrote: > >> Hi rdkitters, >> I'd like to compare the similarity of torsion/atom pair FPs using >> standard atomic numbering with those using pharmacophore types, like the >> 'CATS' at

Re: [Rdkit-discuss] any paper on fingerprint pre-selection/feature selection?

2019-06-17 Thread Lewis Martin
There have been some comparisons between different fingerprints, which Im sure can be found via google, but I haven't seen feature selection. If you're looking for dimensionality reduction, anecdotally I've noticed that hashing bit vectors down to size has no benefit over the traditional folding,

Re: [Rdkit-discuss] How do rdFingerprintGenerator.GetMorganGenerator and AllChem.GetMorganFingerprintAsBitVect differ?

2019-07-10 Thread Lewis Martin
s() method on bit vectors and the similarity calculation code > in rdkit.DataStructs. Take a look at DataStructs.DiceSimilarity() > > Hope this helps, > -greg > > > > On Wed, Jul 10, 2019 at 3:53 AM Lewis Martin > wrote: > >> Hi all, >> Quick question on truncated finger

[Rdkit-discuss] What is returned by GetMMFFVdWParams?

2019-07-02 Thread Lewis Martin
Hi all, Can anyone please help explain what values are returned by GetMMFFVdWParams? It takes two indices as input, so is it an interaction term between the two? Or is it the well depth and minimum (i.e. epsilon and R)? Example: In: m = Chem.MolFromSmiles('C1CCC1OC') m2=Chem.AddHs(m)

[Rdkit-discuss] How do rdFingerprintGenerator.GetMorganGenerator and AllChem.GetMorganFingerprintAsBitVect differ?

2019-07-09 Thread Lewis Martin
Hi all, Quick question on truncated fingerprints, any help is really appreciated. I think I've missed a trick on how the new fingerprint generator works. I thought the below should produce equivalent fingerprints but they are totally different. Has the implementation changed, or maybe I'm

[Rdkit-discuss] Error parsing a MUTAG smiles

2020-03-04 Thread Lewis Martin
Hi all, Im chasing up a small puzzle on parsing SMILES codes if anyone's interested, but its not directly RDkit. I was looking at the molecules in the MUTAG dataset, which is commonly used in graph learning research. Mostly these are just shared as graphs (i.e. vertices and edges) rather than

[Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Lewis Martin
Hi RDKit, Looking for advice on an rdkit-adjacent problem please. Ultimately I'd like to fit an approximate-nearest neighbors index on a dataset of 100 million ligands, featurized by morgan fingerprint. The text file of the smiles is ~6gb but this blows out when loaded with pandas.read_csv() or

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Lewis Martin
arity searching > with short fingerprints: > http://rdkit.blogspot.com/2020/08/doing-similarity-searches-with-highly.html > > -greg > > > On Wed, Sep 9, 2020 at 2:37 AM Lewis Martin > wrote: > >> Hi RDKit, >> >> Looking for advice on an rdkit-adjacent

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Lewis Martin
. Chemfp is amazing but brute-forcing 100million by 100million would surely still take a long time compared with an approximate nearest neighbor approach. Straying from RDKit so Ill leave it there - thanks! On Wed, Sep 9, 2020 at 11:29 AM Francois Berenger wrote: > On 09/09/2020 09:35, Lewis Mar

[Rdkit-discuss] Largest possible Morgan fp bit magnitude

2020-10-08 Thread Lewis Martin
Hi all, Felt sure this would have been asked but I can't find it. What is the 'largest' possible bit in an unfolded Morgan fingerprint? Asked another way, what type of number are the substructure identities hashed into? The Rogers and Hahn ECFP paper says that they hash into a 32-bit integer, and

Re: [Rdkit-discuss] GPU Implementation of shape-based 3D overlap on rdkit?

2020-11-03 Thread Lewis Martin
Ive had an initial go at something like this using JAX. I chose JAX since it has a shallow learning curve, essentially being numpy on a GPU. This is great for vectorized calculations, but less so for applications that involve a lot of control flow (ie if/else statements), which as i understand it

Re: [Rdkit-discuss] XYZ to mol ???

2021-06-06 Thread Lewis Martin
I know this doesn't address the question exactly - but you can also do this (using RDKit) via the jan jensen and colleague's xyz2mol --> https://github.com/jensengroup/xyz2mol - lew On Sat, Jun 5, 2021 at 10:56 AM Storer, Joey (J) wrote: > Dear all, > > > > For molecular modeling workflows and

[Rdkit-discuss] MolsToGridImage drawing multiple conformers of single molecule

2021-06-07 Thread Lewis Martin
Hi all, Is there a way to draw multiple conformers of a single molecule using Draw.MolsToGridImage? Here's a first attempt but, either all conformers are exactly the same or the parent molecule falls back to conformer 0, since all molecules in the grid appear the same: ``` from rdkit import Chem

[Rdkit-discuss] Using MolsToGridImage to draw multiple conformers of a single molecule

2021-06-06 Thread Lewis Martin
Apologies if this doubles up, I think sourceforge was having issues... Hi all, Is there a way to draw multiple conformers of a single molecule using Draw.MolsToGridImage? Here's a first attempt but, either all conformers are exactly the same or the parent molecule falls back to conformer 0,

[Rdkit-discuss] Calculating strain energy of conformers

2021-07-11 Thread Lewis Martin
Hi RDKit, I'm exploring strain energies in the context of virtual screening, something that has been considered for a while[1] and is still being explored today[2]. There may not be a canonical way, but is this a valid/good way to calculate strain energy? I'm just not sure if I'm using the MMFF

Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Lewis Martin
> > Another idea: try to get your pdb file through the pdbredo service. > https://pdb-redo.eu/ > They might have fixed a few things; maybe this PDB will read better in > rdkit. > > Regards, > F. > > On 26/09/2021 17:02, Lewis Martin wrote: > > Hi RDKit, > > W

[Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-26 Thread Lewis Martin
Hi RDKit, While parsing proteins from the PBD with RDKit, I've come across situations where the distance-based bond determination leads to 'incorrect' bonds between atoms that are erroneously too close together. PDB files have no bond information, so it's not really 'incorrect' (rather the model

[Rdkit-discuss] Programmatic access to MMFF torsion indices and parameters

2021-11-19 Thread Lewis Martin
Hi all, Does anyone have a way to access the atom indices and the parameters for all of the dihedrals (aka torsions) in an MMFF force field object? I noticed that one can set mmffVerbosity=2 to print them to stdout, but I'd like to access them programmatically. I believe Paolo Tosco must have done

Re: [Rdkit-discuss] Query on a failed molecule from SureChEMBL

2021-12-15 Thread Lewis Martin
ke > these. > > This gist shows how to write reaction rules for your cases (I guessed for > what the Ns are supposed to be) and then use them: > https://gist.github.com/greglandrum/8fd229bc6bf6c734d1c21da7f2bebebb > > Hope this helps, > -greg > > > On Wed, Dec 15, 2

[Rdkit-discuss] Query on a failed molecule from SureChEMBL

2021-12-14 Thread Lewis Martin
Hi All, Reading molecules from a bulk download of SureChEMBL, I come across a fair few molecules that fail to parse. Not sure whether they SHOULD parse or not. Here is an example: https://www.surechembl.org/chemical/SCHEMBL386 with SMILES code: COC(=O)C1=C(C=CC=C1)C1=CC=C(C[N+]#[N]=[N-])C=C1

[Rdkit-discuss] Using the bitsPerPoint argument of ShapeTanimotoDist

2021-07-21 Thread Lewis Martin
Hi RDKit, How does one input the number of bits to the ShapeTanimotoDist function? The docs indicate the default is *rdkit.DataStructs.cDataStructs.DiscreteValueType.TWOBITVALUE, *but I tried some other values and this gave unexpected results. Specifically: when increasing to higher bit values,