date:20200908

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Andrew Dalke

On Sep 9, 2020, at 04:00, Lewis Martin wrote: > I'd like to keep it FOSS since its for academic publication and hopefully to > be re-used. Chemfp is amazing but brute-forcing 100million by 100million > would surely still take a long time compared with an approximate nearest > neighbor

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Lewis Martin

OK to sum it up, for me writing to binary is a neat, fast, and low-storage solution for fingerprints. Example: o = open('fingerprints.bin', 'wb') gen_mo = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=64) for smi in tqdm_notebook(df['smiles']): mol = Chem.MolFromSmiles(smi) fp

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Greg Landrum

The most efficient (easy) way to store the fingerprints is using DataStructs.BitVectToBinaryText(). That will return a 64byte binary string for a 512bit fingerprint. FWIW: if you haven't seen the recent blog post about similarity searching with short fingerprints:

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Lewis Martin

Cheers Francois - that might be the way to go actually. I'll try with 'bitstring' https://github.com/scott-griffiths/bitstring and I guess write the data as concatenated bitarrays in chunked binary files. I'd like to keep it FOSS since its for academic publication and hopefully to be re-used.

Re: [Rdkit-discuss] h-bond geometry

2020-09-08 Thread Francois Berenger

On 09/09/2020 01:33, Tim Dudgeon wrote: Hi All, thanks for the suggestions. Greg, that's part of what's needed but there's also some more complex logic needed. For instance, if the atom the H is attached to is rotatable e.g. an OH group) then it is more complex than if it is fixed (e.g a N in a

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Francois Berenger

On 09/09/2020 09:35, Lewis Martin wrote: Hi RDKit, Looking for advice on an rdkit-adjacent problem please. Ultimately I'd like to fit an approximate-nearest neighbors index on a dataset of 100 million ligands, featurized by morgan fingerprint. The text file of the smiles is ~6gb but this blows

[Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Lewis Martin

Hi RDKit, Looking for advice on an rdkit-adjacent problem please. Ultimately I'd like to fit an approximate-nearest neighbors index on a dataset of 100 million ligands, featurized by morgan fingerprint. The text file of the smiles is ~6gb but this blows out when loaded with pandas.read_csv() or

Re: [Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

2020-09-08 Thread Andrew Dalke

On Sep 8, 2020, at 14:30, Mike Mazanetz wrote: > Does anyone know whether it’s possible to obtain not just a fingerprint keys > for MACCS (binary values) but the number of occurrences of the keys, > particularly these details: The SMARTS patterns for most of the MACCS keys is available by:

Re: [Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

2020-09-08 Thread Paolo Tosco

Hi Mike, I put together a gist that might help: https://gist.github.com/ptosco/7bbad9e6441724e9638bc4093f48e31b This is basically a modification of the MACCSkeys._pyGenMACCSKeys() RDKit Python function, combined with a function I wrote some time ago to count non-overlapping matches in a

[Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

2020-09-08 Thread Mike Mazanetz

Hi, On second thoughts. The KNIME node does a lot of double counting for the RDKit Substructure Counter, so it's not a useful tool for counting MACCS keys. Anyone got any better ideas? Cheers, mike From: Mike Mazanetz Sent: 08 September 2020 18:42 To:

Re: [Rdkit-discuss] MACCS keys

2020-09-08 Thread Mike Mazanetz

Hi folks, I found that I can always use the KNIME nodes to count these, so no need to reply. Best, mike From: Mike Mazanetz Sent: 08 September 2020 13:30 To: rdkit-discuss@lists.sourceforge.net Subject: [Rdkit-discuss] MACCS keys Hello Forum, Does anyone know whether it's

Re: [Rdkit-discuss] h-bond geometry

2020-09-08 Thread Tim Dudgeon

Hi All, thanks for the suggestions. Greg, that's part of what's needed but there's also some more complex logic needed. For instance, if the atom the H is attached to is rotatable e.g. an OH group) then it is more complex than if it is fixed (e.g a N in a ring). I was wondering whether anyone had

[Rdkit-discuss] MACCS keys

2020-09-08 Thread Mike Mazanetz

Hello Forum, Does anyone know whether it's possible to obtain not just a fingerprint keys for MACCS (binary values) but the number of occurrences of the keys, particularly these details: Thanks, mike 1: #isotopes 2: #atoms with atomic number > 103 3: #group IVA, VA and VIA periods 4-6 4:

Re: [Rdkit-discuss] h-bond geometry

2020-09-08 Thread Greg Landrum

Hi Tim, Assuming that you already have the indices of the atoms that you're interested in looking at, it's pretty easy to calculate the angle between three arbitrary atoms. Here's an example: In [3]: m = Chem.AddHs(Chem.MolFromSmiles('COCO')) In [4]: AllChem.EmbedMolecule(m) Out[4]: 0 In

Re: [Rdkit-discuss] h-bond geometry

2020-09-08 Thread Tosstorff, Andreas via Rdkit-discuss

Hi Tim, also not a solution within RDKit, but maybe of help: The CSD Python API has a lot of functions around hbonds: https://downloads.ccdc.cam.ac.uk/documentation/API/modules/molecule_api.html?highlight=hbond#ccdc.molecule.Molecule.hbonds Hope this helps, Andy On Mon, Sep 7, 2020 at 3:07

Re: [Rdkit-discuss] h-bond geometry

2020-09-08 Thread David Cosgrove

Hi Tim, I don’t have any code, but if you go to https://github.com/harryjubb/arpeggio and look in config.py there are SMARTS definitions for various interaction types with geometric tests that might help. If you already have a suitable complex, you could just use arpeggio.py to pull out the

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

Re: [Rdkit-discuss] h-bond geometry

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

[Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

Re: [Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

Re: [Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

[Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

Re: [Rdkit-discuss] MACCS keys

Re: [Rdkit-discuss] h-bond geometry

[Rdkit-discuss] MACCS keys

Re: [Rdkit-discuss] h-bond geometry

Re: [Rdkit-discuss] h-bond geometry

Re: [Rdkit-discuss] h-bond geometry

16 matches

Site Navigation

Mail list logo

Footer information