Matthew,
Two lines of shameless self-promotion:
This is exactly the kind of problem for Diversity Genie -
http://www.diversitygenie.com/
It is using RDKit library underneath, but wraps it in a simple, easy to use
GUI front-end.
Best regards,
Igor
On Wed, Jul 16, 2014 at 6:18 PM, Matthew Lardy <mla...@gmail.com> wrote:
> Hi all,
>
> I have been playing with the diversity selection in RDKit. I am running
> through a set of ~26,000 molecules to pick a set of 200 diverse molecules.
> I saw some examples of how to do this in Python (my variant of their script
> below), but the memory consumption is massive. I burned through ~15GB of
> memory before I killed it off. Is this about what others have seen, or
> should I move to doing this in C++ or Java (assuming that others have seen
> a significantly lower level of memory consumption)?
>
> Here is the script:
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
> from rdkit import DataStructs
> import gzip
> from rdkit.Chem import Draw
> from rdkit.SimDivFilters import rdSimDivPickers
>
> zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x
> is not None]
>
> zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims]
>
> dm=[]
> for i,fp in enumerate(zims_fps[:26000]): # only 1000 in the demo (in
> the interest of time)
>
> dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True))
> dm = array(dm)
> picker = rdSimDivPickers.MaxMinPicker()
> ids = picker.Pick(dm,26000,200)
> list(ids[:200])
>
>
> Thanks in advance!
> Matt
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss