Hi Igor,

Thanks!  Maybe I am a throwback, but I prefer the command line to a GUI.
Still I'll give it a whirl!  :)

If you are handling millions of molecules without issue; then my Python
skills are really, really, rusty.  Or, I shouldn't be using Python to
handle this much data.  :)

Thanks for the info!
Matt


On Wed, Jul 16, 2014 at 3:31 PM, Igor Filippov <igor.v.filip...@gmail.com>
wrote:

> Matthew,
>
> Two lines of shameless self-promotion:
> This is exactly the kind of problem for Diversity Genie -
> http://www.diversitygenie.com/
> It is using RDKit library underneath, but wraps it in a simple, easy to
> use GUI front-end.
>
> Best regards,
> Igor
>
>
> On Wed, Jul 16, 2014 at 6:18 PM, Matthew Lardy <mla...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have been playing with the diversity selection in RDKit.  I am running
>> through a set of ~26,000 molecules to pick a set of 200 diverse molecules.
>> I saw some examples of how to do this in Python (my variant of their script
>> below), but the memory consumption is massive.  I burned through ~15GB of
>> memory before I killed it off.  Is this about what others have seen, or
>> should I move to doing this in C++ or Java (assuming that others have seen
>> a significantly lower level of memory consumption)?
>>
>> Here is the script:
>>
>> from rdkit import Chem
>> from rdkit.Chem import AllChem
>> from rdkit import DataStructs
>> import gzip
>> from rdkit.Chem import Draw
>> from rdkit.SimDivFilters import rdSimDivPickers
>>
>> zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x
>> is not None]
>>
>> zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims]
>>
>> dm=[]
>> for i,fp in enumerate(zims_fps[:26000]):     # only 1000 in the demo (in
>> the interest of time)
>>
>> dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True))
>> dm = array(dm)
>> picker = rdSimDivPickers.MaxMinPicker()
>> ids = picker.Pick(dm,26000,200)
>> list(ids[:200])
>>
>>
>>  Thanks in advance!
>> Matt
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck
>> Code Sight - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to