On Sep 9, 2020, at 04:00, Lewis Martin wrote:
> I'd like to keep it FOSS since its for academic publication and hopefully to
> be re-used. Chemfp is amazing but brute-forcing 100million by 100million
> would surely still take a long time compared with an approximate nearest
> neighbor
OK to sum it up, for me writing to binary is a neat, fast, and low-storage
solution for fingerprints. Example:
o = open('fingerprints.bin', 'wb')
gen_mo = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=64)
for smi in tqdm_notebook(df['smiles']):
mol = Chem.MolFromSmiles(smi)
fp
The most efficient (easy) way to store the fingerprints is using
DataStructs.BitVectToBinaryText(). That will return a 64byte binary string
for a 512bit fingerprint.
FWIW: if you haven't seen the recent blog post about similarity searching
with short fingerprints:
Cheers Francois - that might be the way to go actually. I'll try with
'bitstring' https://github.com/scott-griffiths/bitstring and I guess write
the data as concatenated bitarrays in chunked binary files.
I'd like to keep it FOSS since its for academic publication and hopefully
to be re-used.
On 09/09/2020 01:33, Tim Dudgeon wrote:
Hi All,
thanks for the suggestions.
Greg, that's part of what's needed but there's also some more complex
logic needed. For instance, if the atom the H is attached to is
rotatable e.g. an OH group) then it is more complex than if it is
fixed (e.g a N in a
On 09/09/2020 09:35, Lewis Martin wrote:
Hi RDKit,
Looking for advice on an rdkit-adjacent problem please. Ultimately I'd
like to fit an approximate-nearest neighbors index on a dataset of 100
million ligands, featurized by morgan fingerprint. The text file of
the smiles is ~6gb but this blows
Hi RDKit,
Looking for advice on an rdkit-adjacent problem please. Ultimately I'd like
to fit an approximate-nearest neighbors index on a dataset of 100 million
ligands, featurized by morgan fingerprint. The text file of the smiles is
~6gb but this blows out when loaded with pandas.read_csv() or
On Sep 8, 2020, at 14:30, Mike Mazanetz wrote:
> Does anyone know whether it’s possible to obtain not just a fingerprint keys
> for MACCS (binary values) but the number of occurrences of the keys,
> particularly these details:
The SMARTS patterns for most of the MACCS keys is available by:
Hi Mike,
I put together a gist that might help:
https://gist.github.com/ptosco/7bbad9e6441724e9638bc4093f48e31b
This is basically a modification of the MACCSkeys._pyGenMACCSKeys() RDKit
Python function, combined with a function I wrote some time ago to count
non-overlapping matches in a
Hi,
On second thoughts. The KNIME node does a lot of double counting for the
RDKit Substructure Counter, so it's not a useful tool for counting MACCS
keys.
Anyone got any better ideas?
Cheers,
mike
From: Mike Mazanetz
Sent: 08 September 2020 18:42
To:
Hi folks,
I found that I can always use the KNIME nodes to count these, so no need to
reply.
Best,
mike
From: Mike Mazanetz
Sent: 08 September 2020 13:30
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] MACCS keys
Hello Forum,
Does anyone know whether it's
Hi All,
thanks for the suggestions.
Greg, that's part of what's needed but there's also some more complex logic
needed. For instance, if the atom the H is attached to is rotatable e.g. an
OH group) then it is more complex than if it is fixed (e.g a N in a ring).
I was wondering whether anyone had
Hello Forum,
Does anyone know whether it's possible to obtain not just a fingerprint keys
for MACCS (binary values) but the number of occurrences of the keys,
particularly these details:
Thanks,
mike
1: #isotopes
2: #atoms with atomic number > 103
3: #group IVA, VA and VIA periods 4-6
4:
Hi Tim,
Assuming that you already have the indices of the atoms that you're
interested in looking at, it's pretty easy to calculate the angle between
three arbitrary atoms. Here's an example:
In [3]: m = Chem.AddHs(Chem.MolFromSmiles('COCO'))
In [4]: AllChem.EmbedMolecule(m)
Out[4]: 0
In
Hi Tim,
also not a solution within RDKit, but maybe of help:
The CSD Python API has a lot of functions around hbonds:
https://downloads.ccdc.cam.ac.uk/documentation/API/modules/molecule_api.html?highlight=hbond#ccdc.molecule.Molecule.hbonds
Hope this helps,
Andy
On Mon, Sep 7, 2020 at 3:07
Hi Tim,
I don’t have any code, but if you go to
https://github.com/harryjubb/arpeggio and look in config.py there are
SMARTS definitions for various interaction types with geometric tests that
might help. If you already have a suitable complex, you could just use
arpeggio.py to pull out the
16 matches
Mail list logo